C++ Strings – Compile-Time Checking for NULL Initialized std::string

cc++11nullstrings

This is sort of the complementary question to How to best protect from 0 passed to std::string parameters?. Basically, I'm trying to figure out whether there is a way to have the compiler warn me if a code path would unconditionally try to call std::string's char* constructor using NULL.

Run time checks are all well and good, but for a case like:

std::string get_first(const std::string& foo) {
    if (foo.empty()) return 0; // Or NULL, or nullptr
    return foo.substr(0, 1);
}

it's annoying that, even though the code is guaranteed to fail if that code path is exercised, and the system header is usually annotated with the precondition saying that the pointer must not be null, this still passes compilation under gcc, clang, etc., even with -std=c++11 -Wall -Wextra -pedantic -Werror. I can block the specific case of 0 on gcc with -Werror=zero-as-null-pointer-constant, but that doesn't help with NULL/nullptr and it's sort of tackling the related but dissimilar problem. The major issue is that a programmer can make this mistake with 0, NULL or nullptr and not notice it if the code path isn't exercised.

Is it possible to force this check to be compile time, covering an entire code base, without nonsense like replacing std::string with a special subclass throughout the code?

Best Answer

There are different approaches, depending on whether it should work on all compilers, in very restricted circumstances and with some side-effects, or only on some, but in exchange far more broadly:

  1. Adding overloads which complain when used. There is [[deprecated]] since C++11 which will complain at the immediate call-site, as long as it's not suppressed, like normally in a system-header.

    GCC and CLANG provide a better suited custom attribute, __attribute__((error("message"))), which will always break the build if the function is used and name the call-site.

    The problem with adding overloads accepting all those things which could be nullpointer-literals, is that it might confuse other template's SFINAE, thus breaking code, and cannot catch an argument already of type char* which unfortunately just happens to be a nullpointer:

    // added overload, common lines:
    template<class X, class = typename
        std::enable_if<std::is_arithmetic<X>() && !std::is_pointer<X>()>
    // best on GCC / clang:
    string(X) __attribute__((error("nullpointer")));
    // best on others:
    [[deprecated("UB")]] string(X) /* no definition, link breaks */;
    
  2. The preferable alternative is marking the argument as non-null, and letting the compilers optimizer figure it out. You are asking for and heeding such warnings, right?
    That's only an option in GCC and CLANG, but it avoids the disadvantage of additional overloads and catches all cases the compiler can figure out, which means it works better with more optimization.

    basic_string(const CharT* s,
                 size_type count,
                 const Allocator& alloc = Allocator()) __attribute__((nonnull));
    basic_string(const CharT* s,
                 const Allocator& alloc = Allocator()) __attribute__((nonnull));
    

    Generally, one can ask GCC / clang whether it can determine the value of any expression at compile-time, using __builtin_constant_p.

Related Topic