Programming – Why Do Many Languages Treat Numbers Starting with 0 as Octal?

javascriptoctalpythonruby

I've read Where are octals useful? and it seems like octals are something that were once upon a time useful.

Many languages treat numbers preceding with a 0 as octal, so the literal 010 is actually 8. A few among these is JavaScript, Python (2.7), and Ruby.

But I don't really see why these languages need octal, especially when the more likely use of the notation is to denote a decimal number with a superfluous 0.

JavaScript is a client-side language, octal seems pretty useless. All three are pretty modern in other sense, and I don't think that there would be much code using octal notation that would be broken by removing this "feature".

So, my questions are:

  • Is there any point of these languages supporting octal literals?
  • If octal literals are necessary, why not use something like 0o10? Why copy an old notation that overrides a more useful use case?

Best Answer

Blind copying of C, just like ratchet freak said in his comment

The vast majority of "language designers" these days have never seen anything but C and its copies (C++, Java, Javascript, PHP, and probably a few dozen others I never heard of). They have never touched FORTRAN, COBOL, LISP, PASCAL, Oberon, FORTH, APL, BLISS, SNOBOL, to name a few.

Once upon a time, exposure to multiple programming languages was MANDATORY in the computer science curriculum, and that didn't include counting C, C++, and Java as three separate languages.

Octal was used in the earlier days because it made reading binary instruction values easier. The PDP-11, for example, BASICALLY had a 4-bit opcode, 2 3-bit register numbers, and 2 3-bit access mechanism fields. Expressing the word in octal made everything obvious.

Because of C's early association with the PDP-11, octal notation was included, since it was very common on PDP-11s at the time.

Other machines had instruction sets that didn't map well to hex. The CDC 6600 had a 60-bit word, with each word containing typically 2 to 4 instructions. Each instruction was 15 or 30 bits.

As for reading and writing values, this is a solved problem, with a well-known industry best practice, at least in the defense industry. You DOCUMENT your file formats. There is no ambiguity when the format is documented, because the document TELLS you whether you are looking at a decimal number, a hex number, or an octal number.

Also note: If your I/O system defaults to leading 0 meaning octal, you have to use some other convention on your output to denote hexadecimal values. This is not necessarily a win.

In my personal opinion, Ada did it best: 2#10010010#, 8#222#, 16#92#, and 146 all represent the same value. (That will probably get me at least three downvotes right there, just for mentioning Ada.)

Related Topic