Programming Languages – Why Many Languages Have Only Arrays and Hashes

data structuresdata typeslanguage-designprogramming-languages

Many programming languages have only those 2 structures, and even some languages that have more structures still only provide special syntax for those 2; usually, [] and {}. Why is this? Is there anything special about those datatypes that is necessary for the completeness of the language?

Best Answer

There's nothing that particularly forces a language to have arrays and hashes as fundamental datatypes. Indeed, many don't (especially older languages). However, there are a few fundamental concepts involved which indicate that these sorts of mappings make for good data structures.

Firstly, the ordered collection where you perform lookups by index number. These are a very common structure that is very useful for the case where you've got a bunch of things and you want to be able to walk through them one by one or look the up by some index. The key reason why this is so popular is that the variation where the collection is compact and mapped onto a contiguous region of memory — the array — is very efficient and fast with modern hardware. It is this efficiency which is why arrays are very common (though not universal). The major alternative to the array is the linked list, which are also quite common; linked lists have linear time lookup (whereas arrays have constant-time lookup) but super-cheap insertion and deletion from the middle of the sequence.

The second major category of collection is a mapping from values of one type (that supports an equality test) to another type. This is a way of realizing a whole class of very simple functions in a memory-based data structure, and it is superb for implementing all sorts of other basic datatypes. The name of these things does vary though (e.g., “dictionary”, “associative array”) as does the implementation strategy; the most common three implementation strategies are the record/struct, the mapping tree, and the hash table. Structs are very common (and are in fact a partial hybrid between dictionaries and arrays, where the key is mapped to an offset into an array/memory block). Trees used to be very common, but have become less so as it turns out they tend to have surprisingly poor performance (their memory access pattern turns out to work poorly with the way CPU memory cache predictors work, which is unfortunate). Hash tables, which were relatively uncommon a few decades ago, work pretty well: they've got reasonable memory access patterns and are easy to implement well (which is definitely not true of trees!). Their major down-side is that they don't guarantee the order of iteration (though that is fixable with some extra complexity in the data structure design).

So, the real thing that languages are providing is ℤ⁺→α and α⁼→β maps. These are both generally very useful! One is done with arrays normally, because they are easy to implement and highly efficient for lookup (typically the most common operation), and the other is done with hash tables (or structures) normally, again because they are easy to implement and usually efficient for lookup. The reason why these two particular maps? They turn out to be sufficient for creating a great many other structures with minimal extra code (which in turn means minimal extra mistakes).

Related Solutions

Should data structures be integrated into the language (as in Python) or be provided in the standard library (as in Java)

It depends what the language is for.

Some examples (somewhat stolen from other answers):

Perl has special syntax for hashtables, arrays, strings. Perl is often used for scripting, these are useful for scripting.
Matlab has special syntax for lists, matrices, structures. Matlab is for doing matrix and vectorial mathematics for engineering.
Java/.NET support string and arrays. These are general purpose languages where arrays and strings are often used (less and less with use of new collection classes)
C/C++ support arrays. These are languages that do not hide hardware from you. Strings are partially supported (no concatenation, use strcpy, etc.)

I think it depends what the purpose/spirit/audience of your language is; how abstract and how far away from hardware you want it to be. Generally the languages that support lists as primitives allow you to create infinitely long lists. While low level such as C/C++ would never have these, because that is not the goal, the spirit of those languages.

To me, garbage collection follows the same logic: does the audience of your language care about knowing exactly when and if memory is being allocated or freed? If yes, malloc/free; if no, then garbage collection.

Functional Programming – Data Structures in Functional Programming

It's been a while since I've worked in LISP, but as I recall, the basic non-atomic structure is a list. Everything else is based on that. So you could have a list of atoms where each atom is a node followed by a list of edges that connect the node to other nodes. I'm sure there's other ways to do it too.

Maybe something like this:

(
  (a (b c)),
  (b (a c)),
  (c (a b d)),
  (d (c))
)

could give a graph like this:

a<-->b<-->c<-->d
^         ^
|         |
+---------+

If you want to get fancy, you could add weights to it as well:

(
  (a (b 1.0 c 2.0)),
  (b (a 1.0 c 1.0)),
  (c (a 1.3 b 7.2 d 10.5)),
  (d (c -10.5))
)

You might also be interested in this: CL-Graph (found by google-searching the phrase "lisp graph structure" )

Best Answer

Related Solutions

Should data structures be integrated into the language (as in Python) or be provided in the standard library (as in Java)

Functional Programming – Data Structures in Functional Programming

Related Topic