C# – Data structure for accessing units of measure

cdata structuresdesignnaming

TL;DR – I'm trying to design an optimal data structure to define units within a unit of measure.

A Unit of measure is essentially a value (or quantity) associated with a unit. SI Units have seven bases or dimensions. Namely: length, mass, time, electric current, temperature, amount of substance (moles), and luminous intensity.

This would be straightforward enough, but there are a number of derived units as well as rates that we frequently use. An example combined unit would be the Newton: kg * m / s^2 and an example rate would be tons / hr.

We have an application that relies heavily upon implied units. We'll embed the units within the variable or column name. But this creates problems when we need to specify a unit of measure with different units. Yes, we can convert the values at input and display but this generates a lot of overhead code that we'd like to encapsulate within its own class.

There are a number of solutions out on codeplex and other collaborative environments. The licensing for the projects is agreeable but the project itself usually ends up being too lightweight or too heavy. We're chasing our own unicorn of "just right."

Ideally, I could define a new unit of measure using something like this:

UOM myUom1 = new UOM(10, volts);
UOM myUom2 = new UOM(43.2, Newtons);

Of course, we use a mix of Imperial and SI units based upon our clients' needs.

We also need to keep this structure of units synchronized with a future database table so we can provide the same degree of consistency within our data too.

What's the best way of defining the units, derived units, and rates that we need to use to create our unit of measure class? I could see using one or more enums, but that could be frustrating for other developers. A single enum would be huge with 200+ entries whereas multiple enums could be confusing based upon SI vs Imperial units and additional breakdown based upon categorization of the unit itself.

Enum examples showing some of my concerns:

myUnits.Volt
myUnits.Newton
myUnits.meter

SIUnit.meter
ImpUnit.foot
DrvdUnit.Newton
DrvdUnitSI.Newton
DrvdUnitImp.FtLbs

Our set of units in use is pretty well defined and it's a finite space. We do need the ability to expand and add new derived units or rates when we have client demand for them. The project is in C# although I think the broader design aspects are applicable to multiple languages.

One of the libraries I looked at allows for free-form input of units via string. Their UOM class then parsed the string and slotted things accordingly. The challenge with this approach is that it forces the developer to think and remember what the correct string formats are. And I run the risk of a runtime error / exception if we don't add additional checks within the code to validate the strings being passed in the constructor.

Another library essentially created too many classes that the developer would have to work with. Along with an equivalent UOM it provided a DerivedUnit and RateUnit and so on. Essentially, the code was overly complex for the problems we're solving. That library would essentially allow any:any combinations (which is legitimate in the units world) but we're happy to scope our issue (simplify our code) by not allowing every possible combination.

Other libraries were ridiculously simple and hadn't even considered operator overloading for example.

In addition, I'm not as worried about attempts at incorrect conversions (for example: volts to meters). Devs are the only ones who will access at this level at this point and we don't necessarily need to protect against those types of mistakes.

Best Answer

The Boost libraries for C++ include an article on dimensional analysis that presents a sample implementation of handling units of measure.

To summarize: Units of measurement are represented as vectors, with each element of the vector representing a fundamental dimension:

typedef int dimension[7]; // m  l  t  ...
dimension const mass      = {1, 0, 0, 0, 0, 0, 0};
dimension const length    = {0, 1, 0, 0, 0, 0, 0};
dimension const time      = {0, 0, 1, 0, 0, 0, 0};

Derived units are combinations of these. For example, force (mass * distance / time^2) would be represented as

dimension const force  = {1, 1, -2, 0, 0, 0, 0};

Imperial versus SI units could be handled by adding a conversion factor.

This implementation relies on C++-specific techniques (using template metaprogramming to easily turn different units of measurement into different compile-time types), but the concepts should transfer to other programming languages.

Related Solutions

C# – Nested Enum type in C++ or C#

I agree with others that this seems overengineered. Usually, you want either a simple enum or a complex hierarchy of classes, it's not a good idea to combine the two.

But if you really want to do this (in C#), I think it's useful to recap what exactly do you want:

Separate types for the hierarchy Kingdom, Phylum, etc., which do not form inheritance hierarchy (otherwise, Phylum could be assigned to Kingdom). Though they could inherit from a common base class.
Each expression like Animalia.Chordata.Aves has to be assignable to a variable, which means we have to work with instances, not nested static types. This is especially problematic for the root type, because there are no global variables in C#. You could solve that by using a singleton. Also, I think there should be only one root, so the code above would become something like Organisms.Instance.Animalia.Chordata.Aves.
Each member has to be a different type, so that Animalia.Chordata compiled, but Plantae.Chordata didn't.
Each member needs to somehow know all its children, for the IsMember() method to work.

The way I would implement these requirements is to start with a class like EnumSet<TChild> (though the name could be better), where TChild is the type of the children of this level in hierarchy. This class would also contain a collection of all its children (see later about filling it). We also need another type to represent leaf level of the hierarchy: non-generic EnumSet:

abstract class EnumSet
{}

abstract class EnumSet<TChild> : EnumSet where TChild : EnumSet
{
    protected IEnumerable<TChild> Children { get; private set; }

    public bool Contains(TChild child)
    {
        return Children.Contains(child);
    }
}

Now we need to create a class for each level in the hierarchy:

abstract class Root : EnumSet<Kingdom>
{}

abstract class Kingdom : EnumSet<Phylum>
{}

abstract class Phylum : EnumSet
{}

And finally some concrete classes:

class Organisms : Root
{
    public static readonly Organisms Instance = new Organisms();

    private Organisms()
    {}

    public readonly Animalia Animalia = new Animalia();
    public readonly Plantae Plantae = new Plantae();
}

class Plantae : Kingdom
{
    public readonly Anthophyta Anthophyta = new Anthophyta();
}

class Anthophyta : Phylum
{}

class Animalia : Kingdom
{
    public readonly Chordata Chordata = new Chordata();
}

class Chordata : Phylum
{}

Notice that children are always fields of the parent class. What this means is that to fill the Children collection, we can use reflection:

public EnumSet()
{
    Children = GetType().GetFields(BindingFlags.Instance | BindingFlags.Public)
                        .Select(f => f.GetValue(this))
                        .Cast<TChild>()
                        .ToArray();
}

One problem with this approach is that Contains() always works only one level down. So, you can do Organisms.Instance.Contains(animalia), but not .Contains(chordata). You can do that by adding overloads of Contains() to the specific hierarchy classes, e.g.:

abstract class Root : EnumSet<Kingdom>
{
    public bool Contains(Phylum phylum)
    {
        return Children.Any(c => c.Contains(phylum));
    }
}

But this would be a lot of work for deep hierarchies.

After all of this, you end up with quite a lot of repetitive code. One way to fix that would be to have a text file that describes the hierarchy and use a T4 template to generate all the classes based on that.

C# – Most efficient data structure for implementing inheritance structure without classes

If there are no functions to inherit, composition of structs may be more favorable than class inheritance. The concept of "composition over inheritance" may apply here, see:

Wikipedia

Best Answer

Related Solutions

C# – Nested Enum type in C++ or C#

C# – Most efficient data structure for implementing inheritance structure without classes

Related Topic