Printf Format Specifier – Why Was the Percent Sign (%) Chosen as the Format Specifier for the Printf Family of Functions?

chistory

Everyone knows that, at least in C, you use the printf family of functions to print a formatted string. And these functions use a percent sign (%) to indicate the beginning of a format specifier. For example, %d means to print an int, and %u means to print an unsigned int. If you're unfamiliar with how the printf function and format placeholders work, or simply need a refresher, the Wikipedia article is a good place to start.

My question is, is there a particularly compelling reason why this was originally or should be chosen in the future as the format specifier?

Obviously the decision was made a long time ago (very likely for a predecessor of even the C language), and it's been more or less "standard" ever since then (not only in C, but also in a vast array of other languages that have adopted its syntax to varying degrees), so it's far too late to ever change. But I'm still curious if anyone has any insight on why this choice might have been made in the first place, and whether it still makes sense as the choice if one is designing a new language with similar functionality.

For example, with C# (and the other family of .NET languages), Microsoft made a slightly different decision regarding the operation of the string formatting functions. Although some degree of type safety can be enforced there (unlike with the implementation of printf in C), and therefore it is unnecessary to include an indication of the type of the corresponding parameter, they decided to use zero-indexed pairs of curly braces ({}) as format specifiers, like so:

string output = String.Format("In {0}, the temperature is {1} degrees Celsius.",
                              "Texas", 37);
Console.WriteLine(output);

// Output:
//     In Texas, the temperature is 37 degrees Celsius.

The documentation for the String.Format method contains more information, as does this article on composite formatting in general, but the exact details are rather unimportant. The point is simply that they abandoned the long-standing practice of using % to indicate the beginning of a format specifier. The C language could just have easily used {d} and {u}, but it didn't. Anyone have any thoughts on why, whether this decision makes sense in retrospect, and whether new implementations should follow it?

Obviously there's no character that could be chosen that wouldn't have to be escapable so that it could be included in the string itself, but that problem is quite well solved already by just using two of them. What other considerations are relevant?

Best Answer

As @Secure notes, C's printf function is inspired by BCPL's writef function. And if you look at the wikipedia page for BCPL, it has an example that shows that BCPL writef also used % to introduce a format specifier.

So we can infer that C used % either because BCPL did, or for the same reasons that BCPL did. My gut feeling is that it was simply that % is one of the least commonly used ASCII characters ... or so the authors thought. It is also likely that they didn't spend a lot of time in weighing the various alternatives. At the time, both BCPL and C were obscure languages, and the authors most likely had more important things to deal with.

However, there is a minor spanner in the works. While C was inspired by BCPL, it is not entirely clear whether the C borrowed BCPL I/O libraries or the other way around. I dimly recall that BCPL's I/O libraries went through a process of evolution about the time that the infix byte indexing operator was added to the language. (Actually, I think I know who would know about that.)

Best Answer

Related Solutions

C Syntax – Design Choices for Arrays, Pointers, and Functions

Why is scanf called scanf? (Same for printf.)

Related Topic