The origin of counting from zero in programming languages

arrayhistoryindexingprogramming-languages

This is a question which I have wondered (and been asked) about for a long time.

In (most? all?) programming languages, an index begins at zero for an array, string, etc. I recognize it became convention over time, adopted in many languages, but can anyone point to the origin of this?

I thought, perhaps, it had to do with all being rooted in binary. But I am not sure of the idea carrying to the necessity in the decimal system — why not start an index from 1?

Does anyone have historical knowledge of programming languages where the decision to begin indexes at zero may have been explained?

Thank you!

EDIT: The Dijkstra writings are further helpful from a mathematical standpoint, but even has he noted, not all languages are zero-indexed. WBT's explanation also makes sense as to why one would start with zero based on memory addresses. (I know some languages handle indexing slightly different based on array manipulation.)

I'm not necessarily looking for the why (which I very much appreciate because it helps further an understanding) but more along the lines of when did this become the convention and/or whether it can be traced to a specific language.

So, for instance in K&R's C, when discussing array indexes, K or R matter-of-factly explains, "Array subscripts always start at zero in C…" (p. 22) Later, in discussing a function to process character arrays, "… a more useful design would be to return the length of the line, or zero if end of file is encountered. Zero is an acceptable end-of-file return because it never is a valid line length." (p. 127)

Based on K&R, I gather a) the convention is adopted from elsewhere, so C is not the inspiration behind zero-indexing and b) there are possibly deeper reasons for its use based on the second example. I know K&R is so widely regarded for its clear prose, so that's another reason I include it, to give an example of what I had hoped another documented language would do to explain the reason behind zero-indexing.

I think both WBT and btilly offer equally good reasons; I wondered if anyone who perhaps knew old (pre-C?) languages which documented the design decision. And at the same time I recognize such information may not exist.

Best Answer

It's about offsets. You have an address, which points to the location in memory where the array begins. Then to access any element, you multiply the array index by the size of the element and add it to the starting address, to find the address for that element.

The first element is at the starting point, so you multiply the size of the element by zero to get zero which is what you add to the starting address to find the location of the first element.

The convention spread because programmers started working in very low-level languages where memory addresses were directly manipulated and in most cases building up from there, maintaining the same convention at each step so that they wouldn't have to relearn or be prone to mistakes when switching between conventions. It's still important to understand how this addressing works especially when working with lower-level languages. I agree this can be a stumbling block for people who are first learning to program in a higher-level language.

The Wikipedia article on this topic also cites a common machine instruction used when working "backwards" and detecting the end of a loop, namely "decrement and jump if zero."

An exception: MATLAB and some other languages bucked the trend and went with an index starting at 1, apparently under the impression that it would be a first programming language for a lot of their target users and that for those folks, starting with 1 makes more intuitive sense. This causes some frustrations for the (relatively small subset of?) programmers who frequently switch between programming languages that start counting at different values.