C Data Structures – Pure C Vector Implementation

cdata structures

I am implementing a vector in C. I am doing this for the fun of programming, for the fun of learning, and for the use of the data structure in later projects. This is not homework. My question is regarding methodology, not necessarily implementation, therefore I am not really looking for a code review.

Now, I have read over many implementation examples and most of them take the size of an individual element as an argument to the Vector_create() function, and use this for later memmove()/memcpy() calls and what not and more or less keep (for instance) the vector's elements in contiguous memory. In other words, the underlying array is essentially a single void*.

In the implementation I wrote before doing a lot of looking at how other people have done it, I allow(force) the client code to allocate an "object" and pass that as a pointer to the Vector_append() function. So my underlying array is a void**. So I guess the real question I am asking is if this is wrong. Wrong as in best practices wrong.

I see that the "normal" way of doing this doesn't use an array of pointers and instead just copies the memory of a struct (for example) into the underlying void* array and thereby keeping the actual data of the elements contiguous, as mentioned earlier. My way avoids a crap load of memmove()/memcpy() because I just copy pointer values, however, I am forcing the client to malloc() everything that is sent to the vector, and thereby do a bunch of screwing around on the heap.

Along those lines, I do call free() on the underlying data item when the item is removed, so the vector library does a free() that the vector library never did the original malloc() on. Is this bad practice too? The only trouble I seem to get into is if there was a member variable (for lack of a better term) that points to dynamic memory within the struct. It then becomes the client responsibility to free() that data, but again, the struct itself is freed from within the vector library call Vector_remove().

Best Answer

Yes, the design of your Vector is 'wrong'.

  • Such a vector is usually thought of as a resizeable array, with the associated expectation that, like an array, it directly contains the elements. Your requirement that Vector stores pointers to allocated elements breaks that expectation.
  • Storing pointers is inefficient if the elements to be stored are small (for example, integers or doubles), because the overhead of dynamic allocation becomes significant (2 to 3 times the data size) and you need to store both the pointer and the actual data, where in the traditional design, you only need to store the few bytes for the actual data. Additionally, in the traditional design, all the elements are in adjacent memory locations, which makes the typical access patterns (looping over the items) much more cache-friendly for the CPU than your design, where the elements are scattered around in memory.
  • Sometimes it is necessary to conceptually store items in multiple containers. With the conventional design, this can be accommodated by storing pointers in the containers and managing the actual objects yourself. This technique can also be used to limit the amount of copying that needs to be done, if the user of the container thinks that it is a reasonable trade-off. With your design, you would need to dynamically allocate the pointers that get stored in the container, which is far from convenient and completely counter-intuitive for most developers.
  • Any interface that requires a pointer to allocated memory is prone to misuse, because the compiler can't see that Vector_append(&a) is an error and to a human reviewer it is also not immediately obvious. If you are lucky, this is detected during testing, but in some cases it will go undetected for a very long time.
  • The fact that Vector_remove() deletes the data item will also not sit nicely in most development circles, because
    • Many coding guidelines require that corresponding malloc and free calls should be made from the same module. Your design makes it impossible to keep that requirement.
    • It is common practice to create, for complex structures that contain pointers to allocated memory, to create a pair of 'constructor'/'destructor' functions that take care of all the allocations/cleanup in one go, including the 'root' struct. This also does not work correctly with Vector_remove calling free.
Related Topic