Tips for choosing the right configuration file format

Regardless of the programing language, the application your are building is going to need to keep configurations data for the application. Here are some question you should ask before you make a choice.

Comments

Comments allow a more informative configuration. Using comment can help understand configuration options by providing an information about what an option represent. When configuration data very is technical, comments become viable to understanding the defined options. Comments of an option can also provide hints for allowed values such as possible values, fixed ranges, value format, etc.

You can still allow a configuration to be informative by using friendly or self explanatory option names in the configuration. However, it is also important to have simple configuration keys and avoid verboreness in the configuration. Sometime it is just better to keep techinal option name over human readable ones.

Reuse of configuration values

There are times that configuration options can share values. When there is such a need, having to duplicate the values become an ineffective solution. This means that whenever a option value changes, many options need to be changed. This can lead to errors in the configuration. One of the reasons why the YAML language is loved is because such a feature.

Multiple Configuration files

Linking from configuration file to another is a great solution when configuration data is too big. Splitting configuration across multpiple files becomes a requirement. This give us more modular and reusable configarion files. However, how to you merge into one configuration?

Even if this support is not native to a language, many frameworks or libraries allow the merging of multiple configurarion files into one. Check if the file format of choice has a need for modular configuration files and whether there is a library that would allow to merge configuration files.

Syntaxt

While the JSON format has a syntax that is more similar to languages like Java and Javascript, the YAML syntax might feel natural to a python developer. Does the syntaxt seem like an important factor? Every format has it pros and cons. While some hate the semi-colon separator in JSON others hate the indentation used in YAML for nested values. Depending on the personality of the team or the project, syntax might be an important factor.

Dynamic Variables

There are cases when some configuration data is sensitive and should not be stored in the configuation file. When using a Versioning system, it is common to commit the configuration file with the rest of the code base. For example credentials should not be stored in a configuration file. If however, this is the case, the file should not be committed to the Versioning System.

A common way to provide dynamic configuration values is via environment variables. Another option is to store these values in a storage such as a Database or an encrypted data storage. However, this will add additional processing delay to the configuration.

Supported Languages

When deciding the file format for your configuration, it is important to consider the support in the programming language of choice. We usually select the programming languages before the configuration file format. Some file format are more popular than others. For instance JSON is supported natively in almost all.

Code Editor support

Syntaxt highlighting is an important tool for a developer's productivity. When choosing a file format, it is important to check the support for that file format in you favorite code editor. If it is not supported out of the box, maybe that editor has a plugin that offers support for the file format. Otherwise, check for the available alternative code editors that suport this file format. Generally the more popular the file format is the more support it would have in source code editors. Other than syntax highlighting, an additonal feature that is worth checking is an early syntaxt errors detection and warnings.

Nested vs Flat Configuration Data

When choosing a file format, it is good to consider the struture of your configuration data. Some configruration file format handle nested values better than others. As nesting becomes too deep, readability is an important factor. How does your configuration format handle nested array and objects? While TOML has a syntax similar to INIT file format, JSON and YAML support the nesting of values.

JSON example

{
  "databases": [
    {
      "ip": "127.0.0.1",
      "port": "1234"
    }
  ],
  "services": {
    "login": {"ip": "127.0.0.1"},
    "account": {"ip": "127.0.0.1"}
  }
}

YAML Example

databases:
  - ip: 127.0.0.1
    port: '1234'
services:
  login:
    ip: 127.0.0.1
  account:
    ip: 127.0.0.1

TOML Example

[[databases]]
ip: 127.0.0.1
port: '1234'

[services.login]
ip: 127.0.0.1

[services.account]
ip: 127.0.0.1

Simplicity

How quick to learn and use is this data format? Is it well documented, do you have enough resources to get started? Can new developers joining a project quickly learn to use this data format?

Performance

The time it would take to process a configuration file would vary as you change the file format.

Example: YAML vs JSON processing performance

For example processing a JSON file it generally much faster than a YAML file. A primary reason is because JSON is a much simpler format that YAML. Because YAML offers more features than JSON, this adds additional processing time when parsing YAML files.

Another factor is that not all languages support YAML natively but modules or libraries are written to provide this support. Generally the processing speed of native functionality of a language is faster than the functionality written in the language. When looking at NodeJS, JSON support is part of the language. However, YAML support is provided by NPM libraries. When comparing the two we can say the running C++ code for parsing JSON would be much faster than the code written in NodeJS that parse YAML.

What now?

The same way every tool has its purpose and limitations, it is important to consider many criteria when selecting a data format for your configuration files. The syntax, popularity, simplicity for new comers, performance are factors to consider.