C# – Convert ANSI (Windows 1252) to UTF8 in C#

ccharacter encodingnetspecial charactersstring

I've asked this before in a round-about manner before here on Stack Overflow, and want to get it right this time. How do I convert ANSI (Codepage 1252) to UTF-8, while preserving the special characters? (I am aware that UTF-8 supports a larger character set than ANSI, but it is okay if I can preserve all UTF-8 characters that are supported by ANSI and substitute the rest with a ? or something)

Why I Want To Convert ANSI → UTF-8

I am basically writing a program that splits vCard files (VCF) into individual files, each containing a single contact. I've noticed that Nokia and Sony Ericsson phones save the backup VCF file in UTF-8 (without BOM), but Android saves it in ANSI (1252). And God knows in what formats the other phones save them in!

So my questions are

  1. Isn't there an industry standard for vCard files' character encoding?
  2. Which is easier for my solving my problem? Converting ANSI to UTF8 (and/or the other way round) or trying to detect which encoding the input file has and notifying the user about it?

tl;dr
Need to know how to convert the character encoding from (ANSI / UTF8) to (UTF8 / ANSI) while preserving all special characters.

Best Answer

You shouldn't convert from one encoding to the other. You have to read each file using the encoding that it was created with, or you will lose information.

Once you read the file using the correct encoding you have the content as a Unicode string, from there you can save it using any encoding you like.

If you need to detect the encoding, you can read the file as bytes and then look for character codes that are specific for either encoding. If the file contains no special characters, either encoding will work as the characters 32..127 are the same for both encodings.