Padding bits in unsigned integers and bitwise operations in C89

bit-manipulationbitwise-operatorscpadding

I have a lot of code that performs bitwise operations on unsigned integers. I wrote my code with the assumption that those operations were on integers of fixed width without any padding bits. For example an array of 32-bit unsigned integers of which all 32 bits available for each integer.

I'm looking to make my code more portable and I'm focused on making sure I'm C89 compliant (in this case). One of the issues that I've come across is possible padded integers. Take this extreme example, taken from the GMP manual:

However on Cray vector systems it may be noted that short and int are always stored in 8 bytes (and with sizeof indicating that) but use only 32 or 46 bits. The nails feature can account for this, by passing for instance 8*sizeof(int)-INT_BIT.

I've also read about this type of padding in other places. I actually read of a post on SO last night (forgive me, I don't have the link and I'm going to cite something similar from memory) where if you have, say, a double with 60 usable bits the other 4 could be used for padding and those padding bits could serve some internal purpose so they cannot be modified.

So let's say for example my code is compiled on a platform where an unsigned int type is sized at 4 bytes, each byte being 8 bits, however the most significant 2 bits are padding bits. Would UINT_MAX in that case be 0x3FFFFFFF (1073741823)?

#include <stdio.h>
#include <stdlib.h>

/* padding bits represented by underscores */
int main( int argc, char **argv )
{
    unsigned int a = 0x2AAAAAAA; /* __101010101010101010101010101010 */
    unsigned int b = 0x15555555; /* __010101010101010101010101010101 */
    unsigned int c = a ^ b; /* ?? __111111111111111111111111111111 */
    unsigned int d = c << 5; /* ??  __111111111111111111111111100000 */
    unsigned int e = d >> 5; /* ?? __000001111111111111111111111111 */
    
    printf( "a: %X\nb: %X\nc: %X\nd: %X\ne: %X\n", a, b, c, d, e );
    return 0;
}

Is it safe to XOR two integers with padding bits?
Wouldn't I XOR whatever the padding bits are?

I can't find this behavior covered in C89.

Furthermore is the c variable guaranteed to be 0x3FFFFFFF or if for example the two padding bits were both on in a or b would c be 0xFFFFFFFF?

Same question with d and e. Am I manipulating the padding bits by shifting?
I would expect to see this below, assuming 32 bits with the 2 most significant bits used for padding, but I want to know if something like this is guaranteed:

a: 2AAAAAAA
b: 15555555
c: 3FFFFFFF
d: 3FFFFFE0
e: 01FFFFFF

Also are padding bits always the most significant bits or could they be the least significant bits?

EDIT 12/19/2010 5PM EST: Christoph has answered my question. Thanks!
I had also asked (above) whether padding bits are always the most significant bits. This is cited in the rationale for the C99 standard, and the answer is no. I am playing it safe and assuming the same for C89. Here is specifically what the C99 rationale says for §6.2.6.2 (Representation of Integer Types):

Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.

Footnotes 44 and 45 mention that parity bits might be padding bits. The committee does not know of any machines with user-accessible parity bits within an integer. Therefore, the committee is not aware of any machines that treat parity bits as padding bits.

EDIT 12/28/2010 3PM EST: I found an interesting discussion on comp.lang.c from a few months ago.

One point made by Dietmar which I found interesting:

Let's note that padding bits are not necessary for the existence of trap representations; combinations of value bits which do not represent a value of the object type would also do.

Best Answer

Bitwise operations (like arithmetic operations) operate on values and ignore padding. The implementation may or may not modify padding bits (or use them internally, eg as parity bits), but portable C code will never be able to detect this. Any value (including UINT_MAX) will not include the padding.

Where integer padding might lead to problems on is if you use things like sizeof (int) * CHAR_BIT and then try to use shifts to access all these bits. If you want to be portable, either only use (unsigned) char, fixed-sized integers (a C99 addition) or determine the number of value-bits programatically. This can be done at compile-time with the preprocessor by comparing UINT_MAX against powers of 2 or at runtime by using bit-operations.

edit:

C90 does not mention integer padding at all, but as far as I can tell, 'invisible' preceding or trailing integer padding bits shouldn't violate the standard (I didn't go through all relevant sections to make sure this is really the case, though); there probaby are problems with mixed padding and value bits as mentioned in the C99 rationale because otherwise, the standard would not have needed to be changed.

As to the meaning of user-accessible: Padding bits are accessible insofar as you can alwaye get at any bit of foo (including padding) by using bit-operations on ((unsigned char *)&foo)[…]. Be careful when modifying the padding bits, though: the result won't change the value of the integer, but might create be a trap-representation nevertheless. In case of C90, this is implicitly unspecified (as in not mentioned at all), in case of C99, it's implementation-defined.

This was not what the rationale quotation was about, though: the cited architecture represents 32-bit integers via two 16-bit integers. In case of unsigned types, the resulting integer has 32 value bits and a precision of 32; in case of signed integers, it only has 31 value bits and a precision of 30: one of the sign bits of the 16-bit integers is used as the sign bit of the 32-bit integer, the other one is ignored, thus creating a padding bit surrounded by value bits. Now, if you access a 32-bit signed integer as an unsigned integer (which is explicitly allowed and does not violate the C99 aliasing rules), the padding bit becomes a (user-accessible) value bit.

PDF versions of the standard

As of ~~1st September 2014~~ September 2021, the best locations by price for the official C and C++ standards documents in PDF seem to be:

C++20 – ISO/IEC 14882:2020: 198 CHF (about $217 US) from iso.org
C++17 – ISO/IEC 14882:2017: $90 NZD (about $65 US) from Standards New Zealand
C++14 – ISO/IEC 14882:2014: $90 NZD (about $65 US) from Standards New Zealand
C++11 – ISO/IEC 14882:2011: $60 from ansi.org or $60 from Techstreet
C++03 – INCITS/ISO/IEC 14882:2003: $30 from ansi.org
C++98 – ISO/IEC 14882:1998: $80 NZD (about $60 US) from Standards New Zealand
C17/C18 – INCITS/ISO/IEC 9899:2018: $116 from INCITS/ANSI / N2176 / c17_updated_proposed_fdis.pdf draft from November 2017 (Link broken, see Wayback Machine N2176)
C11 – ISO/IEC 9899:2011: ~~$30~~ $60 from ansi.org / WG14 draft version N1570
C99 – INCITS/ISO/IEC 9899-1999(R2005): $60 from ansi.org / WG14 draft version N1256
C90 – ISO/IEC 9899:1990: $90 NZD (about $65 USD) from Standards New Zealand

Non-PDF electronic versions of the standard

C89 – Draft version in ANSI text format: (https://web.archive.org/web/20161223125339/http://flash-gordon.me.uk/ansi.c.txt)
C89 – Draft version as HTML document: (http://port70.net/~nsz/c/c89/c89-draft.html)
C90 TC1; ISO/IEC 9899 TCOR1, single-page HTML document: (http://www.open-std.org/jtc1/sc22/wg14/www/docs/tc1.htm)
C90 TC2; ISO/IEC 9899 TCOR2, single-page HTML document: (http://www.open-std.org/jtc1/sc22/wg14/www/docs/tc2.htm)

Print versions of the standard

Print copies of the standards are available from national standards bodies and ISO but are very expensive.

If you want a hardcopy of the C90 standard for much less money than above, you may be able to find a cheap used copy of Herb Schildt's book The Annotated ANSI Standard at Amazon, which contains the actual text of the standard (useful) and commentary on the standard (less useful - it contains several dangerous and misleading errors).

The C99 and C++03 standards are available in book form from Wiley and the BSI (British Standards Institute):

C++03 Standard on Amazon
C99 Standard on Amazon

Standards committee draft versions (free)

The working drafts for future standards are often available from the committee websites:

If you want to get drafts from the current or earlier C/C++ standards, there are some available for free on the internet:

For C:

ANSI X3.159-198 (C89): I cannot find a PDF of C89, but it is almost the same as C90. The only major differences are in the boilerplate and section numbering, although there are some slight textual differences
ISO/IEC 9899:1990 (C90): (Almost the same as ANSI X3.159-198 (C89) except for the frontmatter and section numbering. There is at least one textual difference in section 6.5.7 (previously 3.5.7), where "a list" became "a brace-enclosed list". Note that the conversion between ANSI and ISO/IEC Standard is seen inside this document, the document refers to its name as "ANSI/ISO: 9899/99" although this isn't the right name of the later made standard of it, the right name is "ISO/IEC 9899:1990")
TC1 for C90: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n423.pdf
There isn't a PDF link for TC2 on the WG14 website, sadly.
ISO/IEC 9899:1999 (C99 incorporating all three Technical Corrigenda): http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
An earlier version of C99 incorporating only TC1 and TC2: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
Working draft for the original (i.e. pre-corrigenda) C99: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n843.htm (HTML) and http://www.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf (PDF). Note that there were two later working drafts: N869 and N878, but they seem to have been removed from the WG14 website, so this is the latest one available.
List of changes between C89/C90 and C99: http://port70.net/~nsz/c/c89/c9x_changes.html
TC1 for C99 (only the TC, not the standard incorporating it): http://www.open-std.org/jtc1/sc22/wg14/www/docs/9899tc1/n32071.PDF
TC2 for C99 (only the TC, not the standard incorporating it): http://www.open-std.org/jtc1/sc22/wg14/www/docs/9899-1999_cor_2-2004.pdf
ISO/IEC 9899:2011 (C11): http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
ISO/IEC 9899:2011/Cor 1:2012 (C11's only technical corrigendum): This can be viewed at https://www.iso.org/obp/ui/#iso:std:iso-iec:9899:ed-3:v1:cor:1:v1:en but cannot be downloaded. It is the actual corrigendum, not a draft.
ISO/IEC 9899:2018 (C17/C18): https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf (N2176)
C2x work-in-progress - latest working draft as of 18th October 2020 (N2731): http://www.open-std.org/JTC1/SC22/WG14/www/docs/n2731.pdf

For C++:

ISO/IEC 14882:1998 (C++98): http://www.lirmm.fr/~ducour/Doc-objets/ISO+IEC+14882-1998.pdf
ISO/IEC 14882:2003 (C++03): https://cs.nyu.edu/courses/fall11/CSCI-GA.2110-003/documents/c++2003std.pdf
ISO/IEC 14882:2011 (C++11): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf
ISO/IEC 14882:2014 (C++14): https://github.com/cplusplus/draft/blob/master/papers/n4140.pdf?raw=true
ISO/IEC 14882:2017 (C++17): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf
ISO/IEC 14882:2020 (C++20): https://isocpp.org/files/papers/N4860.pdf
ISO/IEC 14882:2023 (C++23 work-in-progress. Working draft dated March 17 2021): http://open-std.org/JTC1/SC22/WG21/docs/papers/2021/n4885.pdf

Note that these documents are not the same as the standard, though the versions just prior to the meetings that decide on a standard are usually very close to what is in the final standard. The FCD (Final Committee Draft) versions are password protected; you need to be on the standards committee to get them.

Even though the draft versions might be very close to the final ratified versions of the standards, some of this post's editors would strongly advise you to get a copy of the actual documents — especially if you're planning on quoting them as references. Of course, starving students should go ahead and use the drafts if strapped for cash.

It appears that, if you are willing and able to wait a few months after ratification of a standard, to search for "INCITS/ISO/IEC" instead of "ISO/IEC" when looking for a standard is the key. By doing so, one of this post's editors was able to find the C11 and C++11 standards at reasonable prices. For example, if you search for "INCITS/ISO/IEC 9899:2011" instead of "ISO/IEC 9899:2011" on webstore.ansi.org you will find the reasonably priced PDF version.

The site https://wg21.link/ provides short-URL links to the C++ current working draft and draft standards, and committee papers:

https://wg21.link/std11 - C++11
https://wg21.link/std14 - C++14
https://wg21.link/std17 - C++17
https://wg21.link/std20 - C++20
https://wg21.link/std - current working draft

The current draft of the standard is maintained as LaTeX sources on Github. These sources can be converted to HTML using cxxdraft-htmlgen. The following sites maintain HTML pages so generated:

Tim Song - Current working draft - C++11 - C++14 - C++17 - C++20
Eelis - Current working draft

Tim Song also maintains generated HTML and PDF versions of the Networking TS and Ranges TS.

POSIX extensions to the C standard

The POSIX standard (IEEE 1003.1) requires a compliant operating system to include a C compiler. This compiler must in turn be compliant with the C standard, and must also support various extensions defined in the "System Interfaces" section of POSIX (such as the off_t data type, the <aio.h> header, the clock_gettime() function and the _POSIX_C_SOURCE macro.)

So if you've tried to look up a particular function, been informed "This function is part of POSIX, not the C standard", and wondered why an operating system standard was mandating compiler features and language extensions... now you know!

POSIX.1-2001: The System Interfaces section can be downloaded as a separate document from https://mirror.math.princeton.edu/pub/oldlinux/download/c951.pdf. Section 1.7 states that the relevant version of the C standard is C99.

The "Shell and Utilities" section (https://mirror.math.princeton.edu/pub/oldlinux/download/c952.pdf) mandates not only that a C99-compliant compiler should exist, but that it should be invokable from the command line under the name "c99". One way in which this can be implemented is to place a shell script called "c99" in /usr/bin, which calls gcc with the -std=c99 option added to the list of command-line parameters, and blocks any competing standards from being specified.

POSIX.1-2001 had two technical corrigenda, one dated 2002 and one dated 2004. I don't think they're incorporated into the documents as linked above. There's an online HTML version incorporating the corrigenda at https://pubs.opengroup.org/onlinepubs/009695399/ - but I should add that I've had some trouble with the search box and so using Google to search the site is probably your best bet.

There is a paywalled link to download the first corrigendum at https://standards.ieee.org/standard/1003_1-2001-Cor1-2002.html.

There is also a paywalled link for the second at https://standards.ieee.org/standard/1003_1-2001-Cor2-2004.html

There is a draft version of POSIX.1-2008 at http://www.open-std.org/jtc1/sc22/open/n4217.pdf.

POSIX.1-2008 also had two technical corrigenda, the latter of the two being dated 2016. There is an online HTML version of the standard incorporating the corrigenda at https://pubs.opengroup.org/onlinepubs/9699919799.2016edition/ - though, again, I have had situations where the site's own search box wasn't good for finding information.
There is an online HTML version of POSIX.1-2017 at https://pubs.opengroup.org/onlinepubs/9699919799/ - though, again, I recommend using Google instead of that site's searchbox. According to the Open Group website "IEEE 1003.1-2017 ... is a revision to the 1003.1-2008 standard to rollup the standard including its two technical corrigenda (as-is)". Linux manpages describe it as "technically identical" to POSIX.1-2008 with Technical Corrigenda 1 and 2 applied. This is therefore not a major revision and does not change the value of the _POSIX_C_SOURCE macro.

What are bitwise shift (bit-shift) operators and how do they work

The bit shifting operators do exactly what their name implies. They shift bits. Here's a brief (or not-so-brief) introduction to the different shift operators.

The Operators

>> is the arithmetic (or signed) right shift operator.
>>> is the logical (or unsigned) right shift operator.
<< is the left shift operator, and meets the needs of both logical and arithmetic shifts.

All of these operators can be applied to integer values (int, long, possibly short and byte or char). In some languages, applying the shift operators to any datatype smaller than int automatically resizes the operand to be an int.

Note that <<< is not an operator, because it would be redundant.

Also note that C and C++ do not distinguish between the right shift operators. They provide only the >> operator, and the right-shifting behavior is implementation defined for signed types. The rest of the answer uses the C# / Java operators.

(In all mainstream C and C++ implementations including GCC and Clang/LLVM, >> on signed types is arithmetic. Some code assumes this, but it isn't something the standard guarantees. It's not undefined, though; the standard requires implementations to define it one way or another. However, left shifts of negative signed numbers is undefined behaviour (signed integer overflow). So unless you need arithmetic right shift, it's usually a good idea to do your bit-shifting with unsigned types.)

Left shift (<<)

Integers are stored, in memory, as a series of bits. For example, the number 6 stored as a 32-bit int would be:

00000000 00000000 00000000 00000110

Shifting this bit pattern to the left one position (6 << 1) would result in the number 12:

00000000 00000000 00000000 00001100

As you can see, the digits have shifted to the left by one position, and the last digit on the right is filled with a zero. You might also note that shifting left is equivalent to multiplication by powers of 2. So 6 << 1 is equivalent to 6 * 2, and 6 << 3 is equivalent to 6 * 8. A good optimizing compiler will replace multiplications with shifts when possible.

Non-circular shifting

Please note that these are not circular shifts. Shifting this value to the left by one position (3,758,096,384 << 1):

11100000 00000000 00000000 00000000

results in 3,221,225,472:

11000000 00000000 00000000 00000000

The digit that gets shifted "off the end" is lost. It does not wrap around.

Logical right shift (>>>)

A logical right shift is the converse to the left shift. Rather than moving bits to the left, they simply move to the right. For example, shifting the number 12:

00000000 00000000 00000000 00001100

to the right by one position (12 >>> 1) will get back our original 6:

00000000 00000000 00000000 00000110

So we see that shifting to the right is equivalent to division by powers of 2.

Lost bits are gone

However, a shift cannot reclaim "lost" bits. For example, if we shift this pattern:

00111000 00000000 00000000 00000110

to the left 4 positions (939,524,102 << 4), we get 2,147,483,744:

10000000 00000000 00000000 01100000

and then shifting back ((939,524,102 << 4) >>> 4) we get 134,217,734:

00001000 00000000 00000000 00000110

We cannot get back our original value once we have lost bits.

Arithmetic right shift (>>)

The arithmetic right shift is exactly like the logical right shift, except instead of padding with zero, it pads with the most significant bit. This is because the most significant bit is the sign bit, or the bit that distinguishes positive and negative numbers. By padding with the most significant bit, the arithmetic right shift is sign-preserving.

For example, if we interpret this bit pattern as a negative number:

10000000 00000000 00000000 01100000

we have the number -2,147,483,552. Shifting this to the right 4 positions with the arithmetic shift (-2,147,483,552 >> 4) would give us:

11111000 00000000 00000000 00000110

or the number -134,217,722.

So we see that we have preserved the sign of our negative numbers by using the arithmetic right shift, rather than the logical right shift. And once again, we see that we are performing division by powers of 2.