C++ – Designing an in-memory table in C++

cc++11

I'm evaluating my options to structure an in-memory database and I have a few ideas of how to implement it. I would like to know your opinion of what the best design choice is.

I have a column class which is parametrized to represent different column types.

template<typename T>
class Column<T> {
public:
   std::string name();
   T sum();
   T avg();
   ...
private:
   std::string name;
   std::vector<T> vec;
   ...
};

I'm not too sure what the best route to store a vector of Column with different type parameters. For example a 3-column table might have one integer column, float column and string column.

I know there is boost::variant but I'm not allowed to use boost.

I was thinking of using one of the following:

Tagged Union
Pure OO: Extend the column like IntColumn : Column, etc.

What are your thoughts? Got a better idea?

Best Answer

Because the type of a column is a template parameter, you are modelling the column type within the C++ type system. This is good. A Column<int> and Column<std::string> are different types. If there are some properties that are common for all column types (e.g. that a column has a name), you could extract these into a base class so that these common operations can be accessed via a common type. However, no type-specific operations like get() or sum() can exist in this base, and must be part of the templated Column<T>.

If you have a table type that has columns of different types, it is clearly not sensible to force these to have the same type since you would necessarily lose access to the template parameter (“type erasure”). Instead, embrace the different types and make your Table strongly typed as well. A container like std::tuple<T...> can help here.

If you need access to the column-type independent parts, you can always get a pointer to the column that can be used as the base type.

A sketch using C++14 (C++11 would require you to implement a couple of convenience functions yourself, but has std::tuple and template parameter packs):

class ColumnBase {
  ...
public:
  std::string name() { … }
};

template<class T>
class Column : public ColumnBase {
  std::vector<T> m_items;
  ...
};

template<class... T>
class Table {
  std::tuple<Column<T>...> m_columns;

  template<std::size_t... index>
  std::vector<ColumnBase*> columns_vec_helper(std::index_sequence<index...>) {
    return { (&std::get<index>(m_columns))... };
  }

public:
  std::vector<ColumnBase*> columns_vec() {
    return columns_vec_helper(std::make_index_sequence<sizeof...(T)>{});
  }
};

We could then print out the name of all columns:

for (const auto& colBase : table.columns_vec())
  std::cout << "column " << colBase->name() << "\n";

without having to handle each column type separately.

(runnable demo on ideone)

Only templates will give you the type safety that you get an int out of an integer column. In contrast, unions/variant types require the using code to remember all possible types (with template, the type checker enforces that we handle everything). With subtyping, we can't have column-type specific operations that share an implementation. I.e. a method int IntColumn::get(std::size_t i) and a related method const std::string& StringColumn::get(std::size_t i) might look like they have a common interface, but that would be only accidental and cannot be enforced. In particular, any combination of virtual methods and templates in C++ gets very ugly, very fast.

The disadvantage of templates is that you will be required to carefully write generic code, and will have to do template metaprogramming. When done correctly the results can have amazing usability, but the implementation would be advanced C++. If your design is intended to be maintained by less advanced programmers (which will be as baffled as I will be when I look back at this code in a couple of months), then it might be more sensible to avoid such a “clever” solution despite its benefits and use more traditional OOP patterns that give you a similar structure, but might require a couple of static_casts to work.

Related Solutions

C++ – Designing exception classes

I think your colleague was right: you are designing your exception cases based on how simple it is to implement within the hierarchy, not based on the exception-handling needs of the client code.

With one exception type and an enumeration for the error condition (your solution), if the client code needs to handle single error cases (for example, my_errc::error_x) they must write code like this:

try {
    your_library.exception_thowing_function();
} catch(const my_error& err) {
    switch(err.code()) { // this could also be an if
    case my_errc::error_x:
        // handle error here
        break;
    default:
        throw; // we are not interested in other errors
    }
}

With multiple exception types (having a common base for the entire hierarchy), you can write:

try {
    your_library.exception_thowing_function();
} catch(const my_error_x& err) {
    // handle error here
}

where exception classes look like this:

// base class for all exceptions in your library
class my_error: public std::runtime_error { ... };

// error x: corresponding to my_errc::error_x condition in your code
class my_error_x: public my_error { ... };

When writing a library, the focus should be on it's ease of use, not (necessarily) ease of the internal implementation.

You should only compromize the ease of use (how client code will look like) when the effort of doing it right in the library, is prohibitive.

C++ – How to store various sized values in a vector

Firstly, command classes are decoupled from the medium and protocol. That means you can design the command classes for your own programming convenience, rather than having to design it to match exactly to the specifics of each protocol (which would be impossible, since different protocols may have different bit widths for the same command and field).

When I mention convenience, what I mean is that you can use the maximum bit width you'll ever need for each command's fields.

However, you may still need to have device- or protocol-specific validation code, since each device or protocol imposes its own limits to what values can be in those fields. Unless you don't plan to implement any validation at all.

When it comes to validation, there are several choices:

Not doing it at all, if you will be doing all of the programming yourself, and if it is a hobby project such that mistakes do not result in damages.
Validating it eagerly, i.e. in the command class. This may be difficult, since a command class might not know which device or protocol it will be sent to.
Validating it late, i.e. in the protocol class where the command values are being converted into bytes.

For example, even if a validation rule says that a particular field can only have a value in the range 0 - 100, it doesn't stop you from using a uint32_t or int32_t for that field in the command class.

To the second question of having an overloaded method that takes in various built-in number types and append the bytes to an internal byte vector, do notice the caveats.

In my opinion, if you only needs to work with the fundamental integer types, you don't need templates. Instead, you simply provide function overloads for each of the types, and you call the functions with a value of the appropriate type.

void CPacket::addData(uint32_t data) { ... }
void CPacket::addData(int32_t data) { ... }
void CPacket::addData(uint16_t data) { ... }
void CPacket::addData(int16_t data) { ... }
...

Regarding the code inside, there are several choices:

Type punning with union. This assumes that your code will work exclusively with one byte-endianness, thus not needing to consider the possibility of porting to a different byte-endianness.

union
{
    uint32_t value;
    uint8_t bytes[4];
} pun = { data };
// after that, add the bytes to the vector one-by-one, according to the byte endianness of the communication.

Explicitly extracting the bytes with endian-agnostic bitwise arithmetic: (see note on casting)

// only if value is unsigned. For signed value, it must first be cast to unsigned
uint8_t byte0 = (uint8_t)value;
uint8_t byte1 = (uint8_t)(value >> 8ul);
uint8_t byte2 = (uint8_t)(value >> 16ul);
uint8_t byte3 = (uint8_t)(value >> 24ul);

Best Answer

Related Solutions

C++ – Designing exception classes

C++ – How to store various sized values in a vector

Related Topic