C# – Convert ANSI (Windows 1252) to UTF8 in C#

ccharacter encodingnetspecial charactersstring

I've asked this before in a round-about manner before here on Stack Overflow, and want to get it right this time. How do I convert ANSI (Codepage 1252) to UTF-8, while preserving the special characters? (I am aware that UTF-8 supports a larger character set than ANSI, but it is okay if I can preserve all UTF-8 characters that are supported by ANSI and substitute the rest with a ? or something)

Why I Want To Convert ANSI → UTF-8

I am basically writing a program that splits vCard files (VCF) into individual files, each containing a single contact. I've noticed that Nokia and Sony Ericsson phones save the backup VCF file in UTF-8 (without BOM), but Android saves it in ANSI (1252). And God knows in what formats the other phones save them in!

So my questions are

Isn't there an industry standard for vCard files' character encoding?
Which is easier for my solving my problem? Converting ANSI to UTF8 (and/or the other way round) or trying to detect which encoding the input file has and notifying the user about it?

tl;dr
Need to know how to convert the character encoding from (ANSI / UTF8) to (UTF8 / ANSI) while preserving all special characters.

Best Answer

You shouldn't convert from one encoding to the other. You have to read each file using the encoding that it was created with, or you will lose information.

Once you read the file using the correct encoding you have the content as a Unicode string, from there you can save it using any encoding you like.

If you need to detect the encoding, you can read the file as bytes and then look for character codes that are specific for either encoding. If the file contains no special characters, either encoding will work as the characters 32..127 are the same for both encodings.

Original Answer

In .NET it's rather ugly (until 4 or above):

StatusEnum MyStatus = (StatusEnum) Enum.Parse(typeof(StatusEnum), "Active", true);

I tend to simplify this with:

public static T ParseEnum<T>(string value)
{
    return (T) Enum.Parse(typeof(T), value, true);
}

Then I can do:

StatusEnum MyStatus = EnumUtil.ParseEnum<StatusEnum>("Active");

One option suggested in the comments is to add an extension, which is simple enough:

public static T ToEnum<T>(this string value)
{
    return (T) Enum.Parse(typeof(T), value, true);
}

StatusEnum MyStatus = "Active".ToEnum<StatusEnum>();

Finally, you may want to have a default enum to use if the string cannot be parsed:

public static T ToEnum<T>(this string value, T defaultValue) 
{
    if (string.IsNullOrEmpty(value))
    {
        return defaultValue;
    }

    T result;
    return Enum.TryParse<T>(value, true, out result) ? result : defaultValue;
}

Which makes this the call:

StatusEnum MyStatus = "Active".ToEnum(StatusEnum.None);

However, I would be careful adding an extension method like this to string as (without namespace control) it will appear on all instances of string whether they hold an enum or not (so 1234.ToString().ToEnum(StatusEnum.None) would be valid but nonsensical) . It's often be best to avoid cluttering Microsoft's core classes with extra methods that only apply in very specific contexts unless your entire development team has a very good understanding of what those extensions do.

Javascript – Convert JavaScript String to be all lower case

var lowerCaseName = "Your Name".toLowerCase();

Best Answer

Related Solutions

C# – Convert a string to an enum in C#

Original Answer

Javascript – Convert JavaScript String to be all lower case

Related Topic