Configuration Languages – Benefits Over Source Languages

configuration

TL;DR why do people pick YAML/JSON/ini/TOML/XML/plain text to configure applications/packages instead of having the configuration be defined in source files the application/package is written in? I mean, I can see some obvious reasons that pertain to particular cases:

  • Your code is written in multiple languages and they all have to be able to read the same config
  • The language you are using makes it cumbersome to describe data structures this way: if there aren't data literals and everything is SomeOOInterface.add(foo) for 100 lines it's easier and clearer to just parse a data format
  • Configuring a pre-compiled binary where the consumer of the thing can't just alter the source
  • For security reasons you don't want the people configuring the thing to be able to run arbitrary code (from JonasH in the comments)

Just to name a few. Yet I often see these choices being made even in contexts where none of those hold: JSON configs in Javascript projects. Pip using a regular text file with an ad hoc format for dependency management in Python projects. Situations where the people writing the code and doing the configuring are the same people.

For contrast, I called out Javascript a minute ago but many Javascript tools will actually accept Javascript source files for configuration. The built-in Swift package manager has the developer configure projects in an actual .swift file. LISP is the obvious trope-namer here: everything is data, so the division between data and code doesn't even exist.

Configuration languages vary in power but none of them are as expressive as the source language, JSON doesn't even allow comments, and there are times when the ability to perform logic/assign vars/annotate metadata/etc are useful if not necessary. The arbitrary distinction between data and code may be fine for a textbook, but it seems to break down in practice in this particular case.

So when a configuration language is being used without one of the reasons outlined above applies, is it just voodoo chicken coding based on examples where that choice makes sense for the reasons stated? Or is there something I'm missing here?

EDIT

Just discovered this gem: https://matt-rickard.com/heptagon-of-configuration

Best Answer

To perhaps state the obvious, configuration is a different activity than coding. In my experience, configuration is something that is applied at the point of deployment. For example, you might need to configure the URI of a dependency differently depending on where the application is running. This configuration step is something that needs to happen after the code is reviewed and tested. That is, no one should be writing untested code and applying it in production or even formal testing.

I would say that's the primary reason. Having a fully Turing-complete language for configuration is a problem, not a solution. Introducing untested code at configuration time is irresponsible. As a developer, you want your configuration to have limited expressiveness because it allows you to know the bounds of what configuration can be applied at deployment.

As a corollary to that, it's useful to clearly delineate configuration from code. You can name your code artifacts so that they say 'configuration' but apart from that, if your configuration is code there's no clear distinction between what is configuration and what is code. I've been tasked with fixing source where there was no configuration and the developers had hardcoded things like hostnames in variables. It was a mess. I take it that you are suggesting this would be somehow isolated in its own source files but that's a paper-thin wall, IMO. I've regularly dealt with stakeholders who try to work around a proper software development lifecycle (SDLC) by using support staff to make changes (i.e., no requirements or testing.) The approach you are suggesting would be perfect for that kind of abuse.

I don't want to rehash too much of what the other answers have stated but you note:

Just to name a few. Yet I often see these choices being made even in contexts where none of those hold

Which suggests that there are real deployment scenarios where security isn't a consideration. For non-trivial systems, I reject that notion. The use of an executable code for configuration clearly has security risks and you aren't denying that. The idea that such concerns can be ignored in various deployment scenarios is a dangerous way to think about security. At the very least, the approach you are talking about will make it more difficult to ensure that an application deployment has not been modified improperly.

Related Topic