Bit Counting – Why Brian Kernighan’s Method Works

bitcount

I found this link to count number of bits in a variable. I think it is pretty cool, but I can't figure out why it works. Can someone offer an explanation?

Here is the code

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
  v &= v - 1; // clear the least significant bit set
}

Any help is appreciated.

Best Answer

The idea is that each iteration sets the least significance bit that isn't zero to zero -and only it. Since each iteration converts exactly bit from 1 to 0, it'll take as many iterations as there are non-0 bits to convert all the bits to 0(and thus v == 0 and the loop finishes).

So, how does this work? Lets say that the bit at index n is 1 and that the bits in indexes 0 upto n-1 are all 0(we'll use little endianess - so index 0 is 1, index 1 is 2, index 2 is 4, index 3 is 8 and so on).

v-1 subtracts from index 0 - but it's 0, so it converts it to 1 and subtracts from index 1 - but it's also 0, so it converts it to 1 and subtracts from index 2 - and so on until we reach index n. Since index n is 1 it can subtract from it and turn it to 0 - and there it stops.

So, v-1 is like v except there are n 0 that became 1 and one 1 that became 0. In v & v - 1 all the other bits remain as is, the n zeros that where turned to ones remain 0(because 0 & 1 == 0), and the one 1 that was turned to 0 turns to 0(because 1 & 0 == 0). So overall - only a single bit was changed in the iteration, and this change was from 1 to 0.

Related Solutions

History of Why Bytes Are Eight Bits

A lot of really early work was done with 5-bit baudot codes, but those quickly became quite limiting (only 32 possible characters, so basically only upper-case letters, and a few punctuation marks, but not enough "space" for digits).

From there, quite a few machines went to 6-bit characters. This was still pretty inadequate though -- if you wanted upper- and lower-case (English) letters and digits, that left only two more characters for punctuation, so most still had only one case of letters in a character set.

ASCII defined a 7-bit character set. That was "good enough" for a lot of uses for a long time, and has formed the basis of most newer character sets as well (ISO 646, ISO 8859, Unicode, ISO 10646, etc.)

Binary computers motivate designers to making sizes powers of two. Since the "standard" character set required 7 bits anyway, it wasn't much of a stretch to add one more bit to get a power of 2 (and by then, storage was becoming enough cheaper that "wasting" a bit for most characters was more acceptable as well).

Since then, character sets have moved to 16 and 32 bits, but most mainstream computers are largely based on the original IBM PC. Then again, enough of the market is sufficiently satisfied with 8-bit characters that even if the PC hadn't come to its current level of dominance, I'm not sure everybody would do everything with larger characters anyway.

I should also add that the market has changed quite a bit. In the current market, the character size is defined less by the hardware than the software. Windows, Java, etc., moved to 16-bit characters long ago.

Now, the hindrance in supporting 16- or 32-bit characters is only minimally from the difficulties inherent in 16- or 32-bit characters themselves, and largely from the difficulty of supporting i18n in general. In ASCII (for example) detecting whether a letter is upper or lower case, or converting between the two, is incredibly trivial. In full Unicode/ISO 10646, it's basically indescribably complex (to the point that the standards don't even try -- they give tables, not descriptions). Then you add in the fact that for some languages/character sets, even the basic idea of upper/lower case doesn't apply. Then you add in the fact that even displaying characters in some of those is much more complex still.

That's all sufficiently complex that the vast majority of software doesn't even try. The situation is slowly improving, but slowly is the operative word.

Bitwise Operators – Least-Significant Bit Indexing

Is there a rough consensus if the bitmask 0x01 is properly said to have the "zeroth" bit set, or the "first" bit set?

Your question wouldn't have passed an honest pollsters(oxymoron?) sniff test because it was leading. Of course if you had your question might not have been around long. Try this,

Q: Which bit is on in this, 0x01, assuming little endian?

IMHO you would have received answers that said either / or , bit zero or lsb. It is highly unlikely that any 'coder'("I am not an animal") would have said bit one.

Is 2 to the power of 0 = 1 or is 2 to the power of 1 = 1? Humans imply zero offsets without thought, e.g How old are you? How far is it from your house to work?

My specific answer to this,

#define UCHAR_NTH_BIT_m(n) (unsigned char )(1 << ((n) - 1))

is Please don't. because no human will be looking at it, only 'coders'.

Best Answer

Related Solutions

History of Why Bytes Are Eight Bits

Bitwise Operators – Least-Significant Bit Indexing

Related Topic