C++ – Thread safe lazy construction of a singleton in C++


Is there a way to implement a singleton object in C++ that is:

  1. Lazily constructed in a thread safe manner (two threads might simultaneously be the first user of the singleton – it should still only be constructed once).
  2. Doesn't rely on static variables being constructed beforehand (so the singleton object is itself safe to use during the construction of static variables).

(I don't know my C++ well enough, but is it the case that integral and constant static variables are initialized before any code is executed (ie, even before static constructors are executed – their values may already be "initialized" in the program image)? If so – perhaps this can be exploited to implement a singleton mutex – which can in turn be used to guard the creation of the real singleton..)

Excellent, it seems that I have a couple of good answers now (shame I can't mark 2 or 3 as being the answer). There appears to be two broad solutions:

  1. Use static initialisation (as opposed to dynamic initialisation) of a POD static variable, and implementing my own mutex with that using the builtin atomic instructions. This was the type of solution I was hinting at in my question, and I believe I knew already.
  2. Use some other library function like pthread_once or boost::call_once. These I certainly didn't know about – and am very grateful for the answers posted.

Best Answer

Unfortunately, Matt's answer features what's called double-checked locking which isn't supported by the C/C++ memory model. (It is supported by the Java 1.5 and later — and I think .NET — memory model.) This means that between the time when the pObj == NULL check takes place and when the lock (mutex) is acquired, pObj may have already been assigned on another thread. Thread switching happens whenever the OS wants it to, not between "lines" of a program (which have no meaning post-compilation in most languages).

Furthermore, as Matt acknowledges, he uses an int as a lock rather than an OS primitive. Don't do that. Proper locks require the use of memory barrier instructions, potentially cache-line flushes, and so on; use your operating system's primitives for locking. This is especially important because the primitives used can change between the individual CPU lines that your operating system runs on; what works on a CPU Foo might not work on CPU Foo2. Most operating systems either natively support POSIX threads (pthreads) or offer them as a wrapper for the OS threading package, so it's often best to illustrate examples using them.

If your operating system offers appropriate primitives, and if you absolutely need it for performance, instead of doing this type of locking/initialization you can use an atomic compare and swap operation to initialize a shared global variable. Essentially, what you write will look like this:

MySingleton *MySingleton::GetSingleton() {
    if (pObj == NULL) {
        // create a temporary instance of the singleton
        MySingleton *temp = new MySingleton();
        if (OSAtomicCompareAndSwapPtrBarrier(NULL, temp, &pObj) == false) {
            // if the swap didn't take place, delete the temporary instance
            delete temp;

    return pObj;

This only works if it's safe to create multiple instances of your singleton (one per thread that happens to invoke GetSingleton() simultaneously), and then throw extras away. The OSAtomicCompareAndSwapPtrBarrier function provided on Mac OS X — most operating systems provide a similar primitive — checks whether pObj is NULL and only actually sets it to temp to it if it is. This uses hardware support to really, literally only perform the swap once and tell whether it happened.

Another facility to leverage if your OS offers it that's in between these two extremes is pthread_once. This lets you set up a function that's run only once - basically by doing all of the locking/barrier/etc. trickery for you - no matter how many times it's invoked or on how many threads it's invoked.