Config Files: INI, XML, JSON, YAML, TOML
— — Don Parakin — coding
Working with Hugo lately has me thinking about configuration file formats. Formats such as INI, XML, JSON, YAML, TOML. Here’s some thoughts on which one to pick.
Hugo lets you choose to use JSON, YAML, or TOML. I chose TOML (see below). But this post is not about choosing a format for Hugo. Instead, it is about things to consider when choosing a configuration format for an app you are developing.
Config files are one of the most basic of requirements of non-trivial apps. Thou shalt not hardcode values in your code, one of the core commandments of coding, dictates the need for them. This is especially true for values that will change from one deployment environment to another (dev, test, staging, prod).
Needs
First, here are some generic considerations that will influence your choice:
Are your users (those doing the configuring) more like clickers or coders? Clickers include business users, Windows sys admins, and anyone else who would be dangerous or annoyed editing a text file. If you have clickers, you may need to build a user interface or “wizard” instead.
Is your config data simple or complex? Simple means just strings. The app may use these strings as-is, convert it to other types (number, boolean, etc), or split it into simple arrays. More complex data often means needing to support for more data types (converted by the parser, not the app) as well as collections such as sets (arrays) and maps. Even more complex data means needing maps of sets, sets of maps, maps of sets of maps, etc.
Does your app’s programming language have an existing robust library for parsing a file format? If yes, favour that format. If not, avoid building a library yourself just to support a file format (it’ll be harder to do than you first think).
Config files are for both humans and code. For humans, they must be sufficiently easy to read and edit. When errors are made, humans need sufficiently clear indication of what failed.
Humans need to be able to add comments in config files. The app developer should add helpful instructions to the initial or sample config file. The configurer should, when appropriate, add explanations of the values that were set. And the configurer may want to toggle between different values by commenting one line and uncommenting another. Stressing this need for commenting wouldn’t be so necessary if one popular format (um, JSON) didn’t support it.
Code just needs to get the values in the type and structure it needs.
Candidates
Environment Variables
Environment variables (wiki) are a feature of operating systems and their command shells. They’ve been around since 1979 in Unix and 1982 in Windows (DOS).
A common pattern is using a shell script that first sets all necessary environment variables before invoking the app’s executable binary, which retrieves the values as it initializes itself. Use export in Unix (bash) or set in Windows. In this case, the shell script acts as the config file.
If you decide not to use environment variables for all of your configuration values, you might still use it (or a command parameter) for just one value: the file system path to the configuration file.
Yourlang
Also worth considering is using your app’s programming language if it is a dynamic language (not compiled & linked before run-time). Your app gets the config by invoking the contents of a specific file as code. If you keep to a tiny subset of the language, it can be simple enough that non-coders can configure with ease.
For Python, an example is Flask config: it is just Python code. If your config file is outside your sys.path, you may have to use importlib to invoke it.
For Javascript, an example is Webpack: webpack.config.js is just JS code that exports an object containing the config data.
For PHP, there is a big win here. PHP suffers from amnesia: it “forgets” everything after each web request so it must re-read config files for every request. If your config is PHP code, for the first request PHP will parse it, convert it to bytecode, cache it in memory, and run it. For all requests after that, it only needs to get it from cache and run it. Huge!
INI
Next is the popular INI file format (wiki). It goes back to at least 1981 with the release of MS-DOS. It’s a key / value format where keys can (optionally) be grouped into sections. Many parsers exist but with inconsistent implementations. Some return strings values only while others try to convert unquoted strings to boolean or numbers.
Use this if your config values are simple, you language has an existing parser, and your language is not dynamic so config-as-code is not possible (or desirable).
XML
For more complex values, there is the formerly popular XML file format (wiki). Yup, formerly popular. From 1996 when it first came out for about 10 years, it was wildly popular (more so than, say, JSON now). Intended more for data interchange but also used for config. Now loathing it is popular–as is often the fate of older tech.
Does XML deserve to be despised? Yes, a little. But much of the blame goes to pedantic coders who insisted on using child tags instead of attributes and very long names for both tags and attributes. Why? Probably for some imagined abstract YAGNI-ish benefit. The result was XML that was often wildly verbose and bloated.
For example, here’s a comparison of brief XML vs JSON:
<config>
<post title="Config files: ..."
date="2019-08-26"
draft="false">
<tags>
<t>tag1</t>
<t>tag2</t>
</tags>
</post>
...
</config>
[
{ title: "Config files: ...",
date: "2019-08-26",
draft: false,
tags: [
"tag1",
"tag2"
]
},
...
]
About 25% more keystrokes yet the same number of lines. Worthy of the loathing it gets? You decide.
JSON
Also for complex data, there is JSON (wiki). It was devised to replace XML in AJAX calls in web apps. So its original intended use was for data interchange, not for config files.
But JSON does not allow comments! That’s fine for data interchange but definitely not fine for config files! Would a programming language be any good if it didn’t allow comments in the source code? Of course not. Well then a config file format that doesn’t isn’t as well.
Because JSON does not allow comments, please do not choose JSON as your app’s config file format. For data interchange it’s okay; for config files it’s not.
Note that VSCode has extended JSON to JSONC or “JSON with comments”. But until this becomes widely used, stay away from JSON.
YAML
Also for complex data, there is YAML (wiki). It was originally intended as a markup language (like HTML) but later repurposed for data. Config files are a popular use for it.
It’s not a bad choice as its minimal syntax is nice. It is sensitive to indentation so it can give unexpected results or errors if indented improperly. This can frustrate less precise or less technical configurers.
It also converts values to data types (string, number, boolean, date, etc) depending on the value. That’s not always okay. An app that worked for years might mysteriously stop working after a seemingly irrelevant config change.
Consider the following YAML file:
tv_shows:
- Seinfeld
- 24
- !!str 90210
Seinfeld
is returned as type string, as expected.
24
is not; it’s returned as an integer.
This could cause the app to crash, not when the YAML is parsed
but when the app tries to use that value (the error message likely won’t be too helpful).
90210
is returned as type string, but only because the configurer knew and remembered
to explicitly specify the data type using !!str
.
If you can live with the infrequent troubles that indentation and data typing will give, YAML is not a bad choice. If you want something a little more precise, see TOML.
TOML
Also for complex data, there is TOML (wiki). It also aims to be minimal but not so minimal that you can get into trouble indentation and data typing. Although it needs a few more keystrokes, it is much simpler than YAML: compare all the bells & whistles in the YAML 1.2 spec vs the TOML spec.
It’s a great choice since it is fairly minimal, simple, and explicit. Indentation is ignored by the parser. Add indentation, if you want, to improve readability for humans.
It also has explicit syntax to get the data type you desire: strings, integers, floats, booleans, dates, times, etc. A little more typing (to quote strings, for example) but no surprises.
tv_shows = [
"24",
"Seinfeld",
"90210",
]
One vote in favour of TOML was made by Python in PEP-518 when they selected TOML as the file format for Python’s new consolidated configuration file for packages (and applications). PEP is a thoughtful process for making good design decisions.
Solution
As usual, the best candidate depends on the specifics of your app & situation.
For Hugo, I chose TOML. Not so much because the situation called for it. But because it’s better to learn and use one fluently than two confusingly. I’ll use TOML whenever I have the choice.
For Python, I’ll choose INI for quicker & simpler apps because ConfigParser is included. For other situations, I’ll choose TOML.