C++ – How to safely pass objects, especially STL objects, to and from a DLL

abicdllstlwindows

How do I pass class objects, especially STL objects, to and from a C++ DLL?

My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.

Best Answer

The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.

Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.

If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:

Data packing/alignment

Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.

If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.

You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.

Member reordering

If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.

Calling convention

There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?

It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.

Datatype size

According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.

Heap issues

If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.

To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.

STL issues

The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.

Name mangling

Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL@@YAPAUCCDLL_v1@@XZ in MSVC.

You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.

You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:

GCC:

EXPORTS
    GetCCDLL=_Z8GetCCDLLv @1

MSVC:

EXPORTS
    GetCCDLL=?GetCCDLL@@YAPAUCCDLL_v1@@XZ @1

Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.

This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.

Passing class objects to a function

This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.

Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.

//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries

//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
  void* pod_malloc(size_t size)
  {
    HANDLE heapHandle = GetProcessHeap();
    HANDLE storageHandle = nullptr;

    if (heapHandle == nullptr)
    {
      return nullptr;
    }

    storageHandle = HeapAlloc(heapHandle, 0, size);

    return storageHandle;
  }

  void pod_free(void* ptr)
  {
    HANDLE heapHandle = GetProcessHeap();
    if (heapHandle == nullptr)
    {
      return;
    }

    if (ptr == nullptr)
    {
      return;
    }

    HeapFree(heapHandle, 0, ptr);
  }
}

//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
  pod();
  pod(const T& value);
  pod(const pod& copy);
  ~pod();

  pod<T>& operator=(pod<T> value);
  operator T() const;

  T get() const;
  void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)

//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
  //these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
  typedef int original_type;
  typedef std::int32_t safe_type;

public:
  pod() : data(nullptr) {}

  pod(const original_type& value)
  {
    set_from(value);
  }

  pod(const pod<original_type>& copyVal)
  {
    original_type copyData = copyVal.get();
    set_from(copyData);
  }

  ~pod()
  {
    release();
  }

  pod<original_type>& operator=(pod<original_type> value)
  {
    swap(*this, value);

    return *this;
  }

  operator original_type() const
  {
    return get();
  }

protected:
  safe_type* data;

  original_type get() const
  {
    original_type result;

    result = static_cast<original_type>(*data);

    return result;
  }

  void set_from(const original_type& value)
  {
    data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.

    if (data == nullptr)
    {
      return;
    }

    new(data) safe_type (value);
  }

  void release()
  {
    if (data)
    {
      pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
      data = nullptr;
    }
  }

  void swap(pod<original_type>& first, pod<original_type>& second)
  {
    using std::swap;

    swap(first.data, second.data);
  }
};
#pragma pack(pop)

The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).

We can also wrap STL types in this way, although it requires a little extra work:

#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
  //more comfort typedefs
  typedef std::basic_string<charT> original_type;
  typedef charT safe_type;

public:
  pod() : data(nullptr) {}

  pod(const original_type& value)
  {
    set_from(value);
  }

  pod(const charT* charValue)
  {
    original_type temp(charValue);
    set_from(temp);
  }

  pod(const pod<original_type>& copyVal)
  {
    original_type copyData = copyVal.get();
    set_from(copyData);
  }

  ~pod()
  {
    release();
  }

  pod<original_type>& operator=(pod<original_type> value)
  {
    swap(*this, value);

    return *this;
  }

  operator original_type() const
  {
    return get();
  }

protected:
  //this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
  safe_type* data;
  typename original_type::size_type dataSize;

  original_type get() const
  {
    original_type result;
    result.reserve(dataSize);

    std::copy(data, data + dataSize, std::back_inserter(result));

    return result;
  }

  void set_from(const original_type& value)
  {
    dataSize = value.size();

    data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));

    if (data == nullptr)
    {
      return;
    }

    //figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
    safe_type* dataIterPtr = data;
    safe_type* dataEndPtr = data + dataSize;
    typename original_type::const_iterator iter = value.begin();

    for (; dataIterPtr != dataEndPtr;)
    {
      new(dataIterPtr++) safe_type(*iter++);
    }
  }

  void release()
  {
    if (data)
    {
      pod_helpers::pod_free(data);
      data = nullptr;
      dataSize = 0;
    }
  }

  void swap(pod<original_type>& first, pod<original_type>& second)
  {
    using std::swap;

    swap(first.data, second.data);
    swap(first.dataSize, second.dataSize);
  }
};
#pragma pack(pop)

Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.

//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
  virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};

CCDLL_v1* GetCCDLL();

This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:

struct CCDLL_v1_implementation: CCDLL_v1
{
  virtual void ShowMessage(const pod<std::wstring>* message) override;
};

CCDLL_v1* GetCCDLL()
{
  static CCDLL_v1_implementation* CCDLL = nullptr;

  if (!CCDLL)
  {
    CCDLL = new CCDLL_v1_implementation;
  }

  return CCDLL;
}

And now let's implement the ShowMessage function:

#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
  std::wstring workingMessage = *message;

  MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}

Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.

Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)

Now for an EXE to call the DLL:

//main.cpp
#include "../CCDLL/CCDLL.h"

typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;

int main()
{
  HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.

  Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
  CCDLL_v1* CCDLL_lib;

  CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.

  pod<std::wstring> message = TEXT("Hello world!");

  CCDLL_lib->ShowMessage(&message);

  FreeLibrary(ccdll); //unload the library when we're done with it

  return 0;
}

And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.

In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.

Setting a bit

Use the bitwise OR operator (|) to set a bit.

number |= 1UL << n;

That will set the nth bit of number. n should be zero, if you want to set the 1st bit and so on upto n-1, if you want to set the nth bit.

Use 1ULL if number is wider than unsigned long; promotion of 1UL << n doesn't happen until after evaluating 1UL << n where it's undefined behaviour to shift by more than the width of a long. The same applies to all the rest of the examples.

Clearing a bit

Use the bitwise AND operator (&) to clear a bit.

number &= ~(1UL << n);

That will clear the nth bit of number. You must invert the bit string with the bitwise NOT operator (~), then AND it.

Toggling a bit

The XOR operator (^) can be used to toggle a bit.

number ^= 1UL << n;

That will toggle the nth bit of number.

Checking a bit

You didn't ask for this, but I might as well add it.

To check a bit, shift the number n to the right, then bitwise AND it:

bit = (number >> n) & 1U;

That will put the value of the nth bit of number into the variable bit.

Changing the nth bit to x

Setting the nth bit to either 1 or 0 can be achieved with the following on a 2's complement C++ implementation:

number ^= (-x ^ number) & (1UL << n);

Bit n will be set if x is 1, and cleared if x is 0. If x has some other value, you get garbage. x = !!x will booleanize it to 0 or 1.

To make this independent of 2's complement negation behaviour (where -1 has all bits set, unlike on a 1's complement or sign/magnitude C++ implementation), use unsigned negation.

number ^= (-(unsigned long)x ^ number) & (1UL << n);

unsigned long newbit = !!x;    // Also booleanize to force 0 or 1
number ^= (-newbit ^ number) & (1UL << n);

It's generally a good idea to use unsigned types for portable bit manipulation.

number = (number & ~(1UL << n)) | (x << n);

(number & ~(1UL << n)) will clear the nth bit and (x << n) will set the nth bit to x.

It's also generally a good idea to not to copy/paste code in general and so many people use preprocessor macros (like the community wiki answer further down) or some sort of encapsulation.

C# – Embedding one dll inside another as an embedded resource and then calling it from the code

Once you've embedded the third-party assembly as a resource, add code to subscribe to the AppDomain.AssemblyResolve event of the current domain during application start-up. This event fires whenever the Fusion sub-system of the CLR fails to locate an assembly according to the probing (policies) in effect. In the event handler for AppDomain.AssemblyResolve, load the resource using Assembly.GetManifestResourceStream and feed its content as a byte array into the corresponding Assembly.Load overload. Below is how one such implementation could look like in C#:

AppDomain.CurrentDomain.AssemblyResolve += (sender, args) =>
{
    var resName = args.Name + ".dll";    
    var thisAssembly = Assembly.GetExecutingAssembly();    
    using (var input = thisAssembly.GetManifestResourceStream(resName))
    {
        return input != null 
             ? Assembly.Load(StreamToBytes(input))
             : null;
    }
};

where StreamToBytes could be defined as:

static byte[] StreamToBytes(Stream input) 
{
    var capacity = input.CanSeek ? (int) input.Length : 0;
    using (var output = new MemoryStream(capacity))
    {
        int readLength;
        var buffer = new byte[4096];

        do
        {
            readLength = input.Read(buffer, 0, buffer.Length);
            output.Write(buffer, 0, readLength);
        }
        while (readLength != 0);

        return output.ToArray();
    }
}

Finally, as a few have already mentioned, ILMerge may be another option to consider, albeit somewhat more involved.