C++ – Designing an in-memory table in C++

cc++11

I'm evaluating my options to structure an in-memory database and I have a few ideas of how to implement it. I would like to know your opinion of what the best design choice is.

I have a column class which is parametrized to represent different column types.

template<typename T>
class Column<T> {
public:
   std::string name();
   T sum();
   T avg();
   ...
private:
   std::string name;
   std::vector<T> vec;
   ...
};

I'm not too sure what the best route to store a vector of Column with different type parameters. For example a 3-column table might have one integer column, float column and string column.

I know there is boost::variant but I'm not allowed to use boost.

I was thinking of using one of the following:

  1. Tagged Union
  2. Pure OO: Extend the column like IntColumn : Column, etc.

What are your thoughts? Got a better idea?

Best Answer

Because the type of a column is a template parameter, you are modelling the column type within the C++ type system. This is good. A Column<int> and Column<std::string> are different types. If there are some properties that are common for all column types (e.g. that a column has a name), you could extract these into a base class so that these common operations can be accessed via a common type. However, no type-specific operations like get() or sum() can exist in this base, and must be part of the templated Column<T>.

If you have a table type that has columns of different types, it is clearly not sensible to force these to have the same type since you would necessarily lose access to the template parameter (“type erasure”). Instead, embrace the different types and make your Table strongly typed as well. A container like std::tuple<T...> can help here.

If you need access to the column-type independent parts, you can always get a pointer to the column that can be used as the base type.

A sketch using C++14 (C++11 would require you to implement a couple of convenience functions yourself, but has std::tuple and template parameter packs):

class ColumnBase {
  ...
public:
  std::string name() { … }
};

template<class T>
class Column : public ColumnBase {
  std::vector<T> m_items;
  ...
};

template<class... T>
class Table {
  std::tuple<Column<T>...> m_columns;

  template<std::size_t... index>
  std::vector<ColumnBase*> columns_vec_helper(std::index_sequence<index...>) {
    return { (&std::get<index>(m_columns))... };
  }

public:
  std::vector<ColumnBase*> columns_vec() {
    return columns_vec_helper(std::make_index_sequence<sizeof...(T)>{});
  }
};

We could then print out the name of all columns:

for (const auto& colBase : table.columns_vec())
  std::cout << "column " << colBase->name() << "\n";

without having to handle each column type separately.

(runnable demo on ideone)

Only templates will give you the type safety that you get an int out of an integer column. In contrast, unions/variant types require the using code to remember all possible types (with template, the type checker enforces that we handle everything). With subtyping, we can't have column-type specific operations that share an implementation. I.e. a method int IntColumn::get(std::size_t i) and a related method const std::string& StringColumn::get(std::size_t i) might look like they have a common interface, but that would be only accidental and cannot be enforced. In particular, any combination of virtual methods and templates in C++ gets very ugly, very fast.

The disadvantage of templates is that you will be required to carefully write generic code, and will have to do template metaprogramming. When done correctly the results can have amazing usability, but the implementation would be advanced C++. If your design is intended to be maintained by less advanced programmers (which will be as baffled as I will be when I look back at this code in a couple of months), then it might be more sensible to avoid such a “clever” solution despite its benefits and use more traditional OOP patterns that give you a similar structure, but might require a couple of static_casts to work.

Related Topic