What is TOML? An easier way to configure Python apps and more

TOML makes writing configuration files simple, straightforward, and more human-readable than many other formats, including JSON.

Into to TOML: A more human-readable format for configuration files.
Thinkstock

It's funny how some of the simplest software development decisions turn out to also be the toughest. One example is choosing the configuration file format for your application or service. Both JSON and YAML come to mind, of course. But if you need a config file format that's easy to understand and relatively easy to parse into a data structure, you might consider TOML.

TOML, or Tom's Obvious Minimal Language, was created chiefly for storing configuration data, with features that JSON and other formats lack. As an example, JSON doesn't support inline comments. TOML lets you insert comments simply by prefixing them with a hash symbol, as you would in Python. Small wonder, then, that Python itself is gravitating toward TOML as a configuration standard. (See pyproject.toml used by pip to build packages.)

TOML format basics

TOML format files consist of key-value pairs, where keys are strings and values can be one of a number of types. In some ways it's reminiscent of the Microsoft Windows .ini file format, but with support for a broader range of data types.

Here's an example:


name = "string"
integer = 3
float-value = 3.14159
boolean_value = true
"quoted ünicode key" = true
# this is a comment
data = "OK" # a comment after a key/value

Keys are always interpreted as strings. Values can be strings, integers, floats, booleans, various types of date-time values, and two special kinds of values called arrays and inline tables. (More on these shortly.)

Everything after a hash symbol to the end of a line is a comment. This doesn't include hash symbols that are part of key or value strings themselves.

Arrays in TOML

An array is a way to store multiple values in a single key:


int_values = [1, 2, 4, 8, 16]
strings = ["prime", "audio", "soup"]
mixed = [1, 2, "Three", 4.0]
multi-line = [
    "array",
    "of",
    "strings"
]

As you can guess, arrays don't have to all contain the same type of value. And their definitions can span multiple lines if needed. In Python, arrays map directly to lists.

Tables in TOML

A table is a collection of key-value pairs in a TOML file, labeled with a header in square brackets. In Python, a table would be handled like a nested dictionary:


[general]
make_network_connection = true
ping_time = 1200

[user]
default_name = "Anonymous"
ping_time = 1600

The Microsoft Windows .ini file format has a similar feature, and with a similar function. It's to allow groups of key-value pairs to have their own separate namespaces. For instance, the ping_time values for general and user are distinct in their own namespaces; one won't overwrite the other.

Dotted names, inline tables, and table arrays

Another way to produce the same namespacing effect as tables in TOML is to use dotted names. Here's how the above example could be rendered:


general.make_network_connection = true
general.ping_time = 1200

user.default_name = "Anonymous"
user.ping_time = 1600

Yet another way is with inline tables, which is a more compact style and may be more readable for some collections of values:


general = {make_network_connection = true, ping_time = 1200}
user = {default_name = "Anonymous", ping_time = 1600}

Note that the inline table format may look superficially similar to a Python dictionary declaration, but it isn't; it uses = instead of : to depict key/value pairs.

One more thing you can do with tables is create an array of tables, or a table array:


[[movies]]
name = "Blade Runner"
year = 1982
[[movies]]
name = "Blade Runner 2049"
year = 2017

This creates a nested structure, akin to the following in JSON:


movies = {
    {name: "Blade Runner", year: 1982},
    {name: "Blade Runner 2049", year: 2017}
}

Using TOML in Python

Because some parts of the Python ecosystem now use TOML as a configuration language, Python's support for TOML is growing.

As of Python 3.11, a standard library module for TOML, tomllib provides a Python-native way to read TOML and parse it into Python objects, mainly dictionaries. However, that's all tomllib does—it doesn't serialize Python objects into TOML files, so it's not suited to reading and writing TOML; only reading.

The Python core team might eventually add writing TOML as part of tomllib's duties, but it's not expected to happen anytime soon. In the meantime, the tomllib module is useful for ingesting and parsing TOML from configuration files without needing external libraries. If you need to serialize Python objects to TOML, there are a couple of third-party solutions:

  • tomli-w is a minimal library for writing out Python dictionaries to TOML. Note that it doesn't perform automatic validation of the written TOML; for that you'll want to load the data with tomllib and see if it validates.
  • tomlkit performs both reading from and writing out to TOML, so it's a fairly complete solution, especially if you need to support Python versions earlier than 3.11 (which most people will).

TOML gotchas

Finally, there are a couple of gotchas to be careful about, especially if you're writing TOML by hand:

  1. Watch out for directly mapping key names to program variable names that are not supported in the language you're using. For instance, Python allows an underscore in a variable name, but not a dash, so the key float-value above could not be mapped directly to a variable with that name. However, it could be mapped to a key in a dictionary, since Python dictionary keys can be any string.

  2. Booleans in TOML use JSON rendering. One example is using JSON's true and false rather than Python's idiosyncractic True and False. TOML libraries shoud deal with this automatically, but be aware of it when authoring TOML by hand.

Copyright © 2022 IDG Communications, Inc.