C++ – Concatenating Adjacent String Literals

clanguage-features

C and C++ compiles adjacent string literals as a single string literal. For example this:

"Some text..." "and more text"

is equivalent to:

"Some text...and more text"

In other C-family languages like C# or Java, this is a syntax error (which is perfectly fine BTW).

What is the rationale/historical reason for C and C++ to do this?

Best Answer

The original C language was designed in 1969-1972 when computing was still dominated by the 80 column punched card. Its designers used 80 column devices such as the ASR-33 Teletype. These devices did not automatically wrap text, so there was a real incentive to keep source code within 80 columns. Fortran and Cobol had explicit continuation mechanisms to do so, before they finally moved to free format.

It was a stroke of brilliance for Dennis Ritchie (I assume) to realise that there was no ambiguity in the grammar and that long ASCII strings could be made to fit into 80 columns by the simple expedient of getting the compiler to concatenate adjacent literal strings. Countless C programmers were grateful for that small feature.

Once the feature is in, why would it ever be removed? It causes no grief and is frequently handy. I for one wish more languages had it. The modern trend is to have extended strings with triple quotes or other symbols, but the simplicity of this feature in C has never been outdone.

Related Solutions

Printf Format Specifier – Why Was the Percent Sign (%) Chosen as the Format Specifier for the Printf Family of Functions?

As @Secure notes, C's printf function is inspired by BCPL's writef function. And if you look at the wikipedia page for BCPL, it has an example that shows that BCPL writef also used % to introduce a format specifier.

So we can infer that C used % either because BCPL did, or for the same reasons that BCPL did. My gut feeling is that it was simply that % is one of the least commonly used ASCII characters ... or so the authors thought. It is also likely that they didn't spend a lot of time in weighing the various alternatives. At the time, both BCPL and C were obscure languages, and the authors most likely had more important things to deal with.

However, there is a minor spanner in the works. While C was inspired by BCPL, it is not entirely clear whether the C borrowed BCPL I/O libraries or the other way around. I dimly recall that BCPL's I/O libraries went through a process of evolution about the time that the infix byte indexing operator was added to the language. (Actually, I think I know who would know about that.)

Java – How bad is it calling println() often than concatenating strings together and calling it once

There are two 'forces' here, in tension: Performance vs. Readability.

Let's tackle the third problem first though, long lines:

System.out.println("Good morning everyone. I am here today to present you with a very, very lengthy sentence in order to prove a point about how it looks strange amongst other code.");

The best way to implement this and keep readibility, is to use string concatenation:

System.out.println("Good morning everyone. I am here today to present you "
                 + "with a very, very lengthy sentence in order to prove a "
                 + "point about how it looks strange amongst other code.");

The String-constant concatenation will happen at compile time, and will have no effect on performance at all. The lines are readable, and you can just move on.

Now, about the:

System.out.println("Good morning.");
System.out.println("Please enter your name");

vs.

System.out.println("Good morning.\nPlease enter your name");

The second option is significantly faster. I will suggest about 2X as fast.... why?

Because 90% (with a wide margin of error) of the work is not related to dumping the characters to the output, but is overhead needed to secure the output to write to it.

Synchronization

System.out is a PrintStream. All Java implementations that I know of, internally synchronize the PrintStream: See the code on GrepCode!.

What does this mean for your code?

It means that each time you call System.out.println(...) you are synchronizing your memory model, you are checking and waiting for a lock. Any other threads calling System.out will also be locked.

In single-threaded applications the impact of System.out.println() is often limited by the IO performance of your system, how fast can you write out to file. In multithreaded applications, the locking can be more of an issue than the IO.

Flushing

Each println is flushed. This causes the buffers to be cleared and triggers a Console-level write to the buffers. The amount of effort done here is implementation dependant, but, it is generally understood that the performance of the flush is only in small part related to the size of the buffer being flushed. There is a significant overhead related to the flush, where memory buffers are marked as dirty, the Virtual machine is performing IO, and so on. Incurring that overhead once, instead of twice, is an obvious optimization.

Some numbers

I put together the following little test:

public class ConsolePerf {

    public static void main(String[] args) {
        for (int i = 0; i < 100; i++) {
            benchmark("Warm " + i);
        }
        benchmark("real");
    }

    private static void benchmark(String string) {
        benchString(string + "short", "This is a short String");
        benchString(string + "long", "This is a long String with a number of newlines\n"
                  + "in it, that should simulate\n"
                  + "printing some long sentences and log\n"
                  + "messages.");
        
    }
    
    private static final int REPS = 1000;

    private static void benchString(String name, String value) {
        long time = System.nanoTime();
        for (int i = 0; i < REPS; i++) {
            System.out.println(value);
        }
        double ms = (System.nanoTime() - time) / 1000000.0;
        System.err.printf("%s run in%n    %12.3fms%n    %12.3f lines per ms%n    %12.3f chars per ms%n",
                name, ms, REPS/ms, REPS * (value.length() + 1) / ms);
        
    }

    
}

The code is relatively simple, it repeatedly prints either a short, or a long string to output. The long String has multiple newlines in it. It measures how long it takes to print 1000 iterations of each.

If I run it at the unix (Linux) command-prompt, and redirect the STDOUT to /dev/null, and print the actual results to STDERR, I can do the following:

java -cp . ConsolePerf > /dev/null 2> ../errlog

The output (in errlog) looks like:

Warm 0short run in
           7.264ms
         137.667 lines per ms
        3166.345 chars per ms
Warm 0long run in
           1.661ms
         602.051 lines per ms
       74654.317 chars per ms
Warm 1short run in
           1.615ms
         619.327 lines per ms
       14244.511 chars per ms
Warm 1long run in
           2.524ms
         396.238 lines per ms
       49133.487 chars per ms
.......
Warm 99short run in
           1.159ms
         862.569 lines per ms
       19839.079 chars per ms
Warm 99long run in
           1.213ms
         824.393 lines per ms
      102224.706 chars per ms
realshort run in
           1.204ms
         830.520 lines per ms
       19101.959 chars per ms
reallong run in
           1.215ms
         823.160 lines per ms
      102071.811 chars per ms

What does this mean? Let me repeat the last 'stanza':

realshort run in
           1.204ms
         830.520 lines per ms
       19101.959 chars per ms
reallong run in
           1.215ms
         823.160 lines per ms
      102071.811 chars per ms

It means that, for all intents and purposes, even though the 'long' line is about 5-times longer, and contains multiple newlines, it takes just about as long to output as the short line.

The number of characters-per-second for the long run is 5 times as much, and the elapsed time is about the same.....

In other words, your performance scales relative to the number of printlns you have, not what they print.

Update: What happens if you redirect to a file, instead of to /dev/null?

realshort run in
           2.592ms
         385.815 lines per ms
        8873.755 chars per ms
reallong run in
           2.686ms
         372.306 lines per ms
       46165.955 chars per ms

It is a whole lot slower, but the proportions are about the same....