C++ – Compiling UTF-8 encoded source with Unicode line separators

ccompiler-constructionutf-8visual c++visual studio

Using the latest version of the Microsoft Compiler (included with the Win7 SDK), I'm attempting to compile a source file that's encoded using UTF-8 with unicode line separators.

Unfortunately, the code will not compile — even if I include the UTF-8 signature at the start of the file. For example, if I try to compile this:

#include <stdio.h>

int main (void)
{
    printf("Hello!");
    return 0;
}

I'll see the following error:


Prompt> cl test.c

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80×86
Copyright (C) Microsoft Corporation. All rights reserved.

test.c
test.c(1) : warning C4067: unexpected tokens following preprocessor directive –
expected a newline
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation. All rights reserved.

/out:test.exe
test.obj
LINK : fatal error LNK1561: entry point must be defined


Has anyone encountered this problem before? Any solutions?

Thanks!
Andrew

Best Answer

When you say "unicode line separators" do you mean UTF-16/UCS-2 (ie., 16-bit characters)? If that's the case (the file is a mix of different encodings), I'd say the only reasonable fix is to fix the files.

If you mean the line endings are some other Unicode code point (still encoded in UTF-8), then you'll still need to fix the files. The standard says this about the first phase of translation:

Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing newline characters for end-of-line indicators) if necessary.

Apparently MS does not perform this translation for the 'unicode line separators', so you'll need to.

Related Topic