Contrary to the answers here, you DON'T need to worry about encoding if the bytes don't need to be interpreted!
Like you mentioned, your goal is, simply, to "get what bytes the string has been stored in".
(And, of course, to be able to re-construct the string from the bytes.)
For those goals, I honestly do not understand why people keep telling you that you need the encodings. You certainly do NOT need to worry about encodings for this.
Just do this instead:
static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
// Do NOT use on arbitrary bytes; only use on GetBytes's output on the SAME system
static string GetString(byte[] bytes)
{
char[] chars = new char[bytes.Length / sizeof(char)];
System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
return new string(chars);
}
As long as your program (or other programs) don't try to interpret the bytes somehow, which you obviously didn't mention you intend to do, then there is nothing wrong with this approach! Worrying about encodings just makes your life more complicated for no real reason.
Additional benefit to this approach: It doesn't matter if the string contains invalid characters, because you can still get the data and reconstruct the original string anyway!
It will be encoded and decoded just the same, because you are just looking at the bytes.
If you used a specific encoding, though, it would've given you trouble with encoding/decoding invalid characters.
I've been experimenting with the various methods .NET provide for URL encoding. Perhaps the following table will be useful (as output from a test app I wrote):
Unencoded UrlEncoded UrlEncodedUnicode UrlPathEncoded EscapedDataString EscapedUriString HtmlEncoded HtmlAttributeEncoded HexEscaped
A A A A A A A A %41
B B B B B B B B %42
a a a a a a a a %61
b b b b b b b b %62
0 0 0 0 0 0 0 0 %30
1 1 1 1 1 1 1 1 %31
[space] + + %20 %20 %20 [space] [space] %20
! ! ! ! ! ! ! ! %21
" %22 %22 " %22 %22 " " %22
# %23 %23 # %23 # # # %23
$ %24 %24 $ %24 $ $ $ %24
% %25 %25 % %25 %25 % % %25
& %26 %26 & %26 & & & %26
' %27 %27 ' ' ' ' ' %27
( ( ( ( ( ( ( ( %28
) ) ) ) ) ) ) ) %29
* * * * %2A * * * %2A
+ %2b %2b + %2B + + + %2B
, %2c %2c , %2C , , , %2C
- - - - - - - - %2D
. . . . . . . . %2E
/ %2f %2f / %2F / / / %2F
: %3a %3a : %3A : : : %3A
; %3b %3b ; %3B ; ; ; %3B
< %3c %3c < %3C %3C < < %3C
= %3d %3d = %3D = = = %3D
> %3e %3e > %3E %3E > > %3E
? %3f %3f ? %3F ? ? ? %3F
@ %40 %40 @ %40 @ @ @ %40
[ %5b %5b [ %5B %5B [ [ %5B
\ %5c %5c \ %5C %5C \ \ %5C
] %5d %5d ] %5D %5D ] ] %5D
^ %5e %5e ^ %5E %5E ^ ^ %5E
_ _ _ _ _ _ _ _ %5F
` %60 %60 ` %60 %60 ` ` %60
{ %7b %7b { %7B %7B { { %7B
| %7c %7c | %7C %7C | | %7C
} %7d %7d } %7D %7D } } %7D
~ %7e %7e ~ ~ ~ ~ ~ %7E
Ā %c4%80 %u0100 %c4%80 %C4%80 %C4%80 Ā Ā [OoR]
ā %c4%81 %u0101 %c4%81 %C4%81 %C4%81 ā ā [OoR]
Ē %c4%92 %u0112 %c4%92 %C4%92 %C4%92 Ē Ē [OoR]
ē %c4%93 %u0113 %c4%93 %C4%93 %C4%93 ē ē [OoR]
Ī %c4%aa %u012a %c4%aa %C4%AA %C4%AA Ī Ī [OoR]
ī %c4%ab %u012b %c4%ab %C4%AB %C4%AB ī ī [OoR]
Ō %c5%8c %u014c %c5%8c %C5%8C %C5%8C Ō Ō [OoR]
ō %c5%8d %u014d %c5%8d %C5%8D %C5%8D ō ō [OoR]
Ū %c5%aa %u016a %c5%aa %C5%AA %C5%AA Ū Ū [OoR]
ū %c5%ab %u016b %c5%ab %C5%AB %C5%AB ū ū [OoR]
The columns represent encodings as follows:
UrlEncoded: HttpUtility.UrlEncode
UrlEncodedUnicode: HttpUtility.UrlEncodeUnicode
UrlPathEncoded: HttpUtility.UrlPathEncode
EscapedDataString: Uri.EscapeDataString
EscapedUriString: Uri.EscapeUriString
HtmlEncoded: HttpUtility.HtmlEncode
HtmlAttributeEncoded: HttpUtility.HtmlAttributeEncode
HexEscaped: Uri.HexEscape
NOTES:
HexEscape
can only handle the first 255 characters. Therefore it throws an ArgumentOutOfRange
exception for the Latin A-Extended characters (eg Ā).
This table was generated in .NET 4.0 (see Levi Botelho's comment below that says the encoding in .NET 4.5 is slightly different).
EDIT:
I've added a second table with the encodings for .NET 4.5. See this answer: https://stackoverflow.com/a/21771206/216440
EDIT 2:
Since people seem to appreciate these tables, I thought you might like the source code that generates the table, so you can play around yourselves. It's a simple C# console application, which can target either .NET 4.0 or 4.5:
using System;
using System.Collections.Generic;
using System.Text;
// Need to add a Reference to the System.Web assembly.
using System.Web;
namespace UriEncodingDEMO2
{
class Program
{
static void Main(string[] args)
{
EncodeStrings();
Console.WriteLine();
Console.WriteLine("Press any key to continue...");
Console.Read();
}
public static void EncodeStrings()
{
string stringToEncode = "ABCD" + "abcd"
+ "0123" + " !\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~" + "ĀāĒēĪīŌōŪū";
// Need to set the console encoding to display non-ASCII characters correctly (eg the
// Latin A-Extended characters such as ĀāĒē...).
Console.OutputEncoding = Encoding.UTF8;
// Will also need to set the console font (in the console Properties dialog) to a font
// that displays the extended character set correctly.
// The following fonts all display the extended characters correctly:
// Consolas
// DejaVu Sana Mono
// Lucida Console
// Also, in the console Properties, set the Screen Buffer Size and the Window Size
// Width properties to at least 140 characters, to display the full width of the
// table that is generated.
Dictionary<string, Func<string, string>> columnDetails =
new Dictionary<string, Func<string, string>>();
columnDetails.Add("Unencoded", (unencodedString => unencodedString));
columnDetails.Add("UrlEncoded",
(unencodedString => HttpUtility.UrlEncode(unencodedString)));
columnDetails.Add("UrlEncodedUnicode",
(unencodedString => HttpUtility.UrlEncodeUnicode(unencodedString)));
columnDetails.Add("UrlPathEncoded",
(unencodedString => HttpUtility.UrlPathEncode(unencodedString)));
columnDetails.Add("EscapedDataString",
(unencodedString => Uri.EscapeDataString(unencodedString)));
columnDetails.Add("EscapedUriString",
(unencodedString => Uri.EscapeUriString(unencodedString)));
columnDetails.Add("HtmlEncoded",
(unencodedString => HttpUtility.HtmlEncode(unencodedString)));
columnDetails.Add("HtmlAttributeEncoded",
(unencodedString => HttpUtility.HtmlAttributeEncode(unencodedString)));
columnDetails.Add("HexEscaped",
(unencodedString
=>
{
// Uri.HexEscape can only handle the first 255 characters so for the
// Latin A-Extended characters, such as A, it will throw an
// ArgumentOutOfRange exception.
try
{
return Uri.HexEscape(unencodedString.ToCharArray()[0]);
}
catch
{
return "[OoR]";
}
}));
char[] charactersToEncode = stringToEncode.ToCharArray();
string[] stringCharactersToEncode = Array.ConvertAll<char, string>(charactersToEncode,
(character => character.ToString()));
DisplayCharacterTable<string>(stringCharactersToEncode, columnDetails);
}
private static void DisplayCharacterTable<TUnencoded>(TUnencoded[] unencodedArray,
Dictionary<string, Func<TUnencoded, string>> mappings)
{
foreach (string key in mappings.Keys)
{
Console.Write(key.Replace(" ", "[space]") + " ");
}
Console.WriteLine();
foreach (TUnencoded unencodedObject in unencodedArray)
{
string stringCharToEncode = unencodedObject.ToString();
foreach (string columnHeader in mappings.Keys)
{
int columnWidth = columnHeader.Length + 1;
Func<TUnencoded, string> encoder = mappings[columnHeader];
string encodedString = encoder(unencodedObject);
// ASSUMPTION: Column header will always be wider than encoded string.
Console.Write(encodedString.Replace(" ", "[space]").PadRight(columnWidth));
}
Console.WriteLine();
}
}
}
}
Click here to run code on dotnetfiddle.net
Best Answer
Bit late for an answer but this is best solution if you are using HttpActionContext in your filter You can always use it as mentioned here:-