If you're using Visual C++ do the following: You include intrin.h and call the following functions:
For 16 bit numbers:
unsigned short _byteswap_ushort(unsigned short value);
For 32 bit numbers:
unsigned long _byteswap_ulong(unsigned long value);
For 64 bit numbers:
unsigned __int64 _byteswap_uint64(unsigned __int64 value);
8 bit numbers (chars) don't need to be converted.
Also these are only defined for unsigned values they work for signed integers as well.
For floats and doubles it's more difficult as with plain integers as these may or not may be in the host machines byte-order. You can get little-endian floats on big-endian machines and vice versa.
Other compilers have similar intrinsics as well.
In GCC for example you can directly call some builtins as documented here:
uint32_t __builtin_bswap32 (uint32_t x)
uint64_t __builtin_bswap64 (uint64_t x)
(no need to include something). Afaik bits.h declares the same function in a non gcc-centric way as well.
16 bit swap it's just a bit-rotate.
Calling the intrinsics instead of rolling your own gives you the best performance and code density btw..
Big-Endian (BE) / Little-Endian (LE) are two ways to organize multi-byte words. For example, when using two bytes to represent a character in UTF-16, there are two ways to represent the character 0x1234
as a string of bytes (0x00-0xFF):
Byte Index: 0 1
---------------------
Big-Endian: 12 34
Little-Endian: 34 12
In order to decide if a text uses UTF-16BE or UTF-16LE, the specification recommends to prepend a Byte Order Mark (BOM) to the string, representing the character U+FEFF. So, if the first two bytes of a UTF-16 encoded text file are FE
, FF
, the encoding is UTF-16BE. For FF
, FE
, it is UTF-16LE.
A visual example: The word "Example" in different encodings (UTF-16 with BOM):
Byte Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
------------------------------------------------------------
ASCII: 45 78 61 6d 70 6c 65
UTF-16BE: FE FF 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65
UTF-16LE: FF FE 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00
For further information, please read the Wikipedia page of Endianness and/or UTF-16.
Best Answer
In C, C++
See also: Perl version