How did separation of code and data become a practice

design-patterns

Please read the question carefully: it asks how, not why.

I recently came across this answer, which suggests using a database to store immutable data:

It sounds like many of the magic numbers you describe – particularly if they are part dependent – are really data, not code. […] It may mean an SQL type database, or it may simply mean a formatted text file.

It would seem to me that if you have data that is part of what your program does, then the thing to do is to put it in the program. For example, if your program's function is to count vowels, what's wrong with having vowels = "aeiou" in it? After all, most languages have data structures designed for precisely this use. Why would you bother to separate data by putting it in a "formatted text file", as suggested above? Why not just make that text file formatted in your programming language of choice? Now is it a database? Or is it code?

I'm sure some will think this is a dumb question, but I ask it in all seriousness. I feel like "separate code and data" is emerging culturally as some sort of self-evident truth, along with other obvious things like "don't give your variables misleading names" and "don't avoid using whitespace just because your language considers it insignificant".

Take for example, this article: The Problem with Separating Data from Puppet Code. The Problem? What problem? If Puppet is a language for describing my infrastructure, why can't it also describe that the nameserver is 8.8.8.8? It seems to me that the problem isn't that code and data are mingled,1 but that Puppet lacks sufficiently rich data structures and ways to interface to other things.

I find this shift disturbing. Object oriented programming said "we want arbitrarily rich data structures", and so endowed data structures with powers of code. You get encapsulation and abstraction as a result. Even SQL databases have stored procedures. When you sequester data into YAML or text files or dumb databases as if you are removing a tumor from the code, you lose all of that.

Can anyone explain how this practice of separating data from code came to be, and where it's going? Can anyone cite publications by luminaries, or provide some relevant data that demonstrates "separate code from data" as an emerging commandment, and illustrates its origin?

1: if one can even make such distinctions. I'm looking at you, Lisp programmers.

Best Answer

There are many good reasons to separate data from code, and some reasons not to. The following come to mind.

Timeliness. When is data value known? Is it at the time the code is written, when it is compiled, linked, release, licensed, configure, started execution or while running. For example, the number of days in a week (7) is known early, but the USD/AUD exchange rate will be known quite late.

Structure. Is this a single data time set according to a single consideration, or might it be inherited or part of a larger collection of items? Languages like YAML and JSON enable combining of value from multiple sources. Perhaps some things that initially seem immutable are better made accessible to as properties in a configuration manager.

Locality. If all the data items are stored in a limited number of places it is far easier to manage them, particularly if some might need to be changed to new (immutable) values. Editing source code just to change data values introduces the risk of inadvertent changes and bugs.

Separation of concerns. Getting algorithms to work correctly is best separated from consideration of what data values to use. Data is needed to test algorithms, not to be part of them. See also http://c2.com/cgi/wiki?ZeroOneInfinityRule.

In answer to your question this is not a new thing. The core principles have not changed in more than 30 years, and have been written about repeatedly over that time. I can recall no major publications on the topic as it is generally not considered controversial, just something to explain to newcomers. There is a bit more here: http://c2.com/cgi/wiki?SeparationOfDataAndCode.

My personal experience is that the importance of this separation in a particular piece of software becomes greater over time, not less. Values that were hard-coded are moved into header files, compiled-in values are moved into configuration files, simple values become part of hierarchical and managed structures.

As to trends, I haven't seen any major changes in attitude amongst professional programmers (10+ years), but the industry is increasingly full of youngsters and many things I thought were known and decided keep getting challenged and reinvented, sometimes out of new insights but sometimes out of ignorance.

Related Topic