The YAML (“YAML Ain't Markup Language”) configuration language sits at the heart of many modern applications including Kubernetes, Ansible, CircleCI, and Salt. After all, YAML offers many advantages, like readability, flexibility, and the ability to work with JSON files. But YAML is also a source of pitfalls and gotchas for the uninitiatied or incautious.
Many aspects of YAML’s behavior allow for momentary convenience, but at the cost of unexpected zigs or zags later on down the line. Even folks with plenty of experience assembling or deploying YAML can be bitten by these issues, which often surface in the guise of seemingly innocuous behavior.
Here are seven steps you can take to guard against the most troublesome gotchas in YAML.
When in doubt, quote strings
The single most powerful defensive practice you can adopt when writing YAML: Quote everything that is meant to be a string.
One of YAML’s best-known quirks is that you can write strings without quoting:
- movie: title: Blade Runner year: 1982
In this example, the keys movie
, title
, and year
will be interpreted as strings, as will the value Blade Runner
. The value 1982
will be parsed as a number.
But what happens here?
- movie: title: 1979 year: 2016
That’s right—the movie title will be interpreted as a number. And that’s not even the worst thing that can happen:
- movie: title: No year: 2012
What are the odds this title will be interpreted as a boolean?
If you want to make absolutely sure that keys and values will be interpreted as strings, and guard against any potential ambiguities (and a lot of ambiguities can creep into YAML), then quote your strings:
- "movie": "title": "Blade Runner" "year": 1982
If you’re unable to quote strings for some reason, you can use a shorthand prefix to indicate the type. These make YAML a little noisier to read than quoted strings, but they are just as unambiguous as quoting:
movie: !!str Blade Runner
Beware of multiline strings
YAML has multiple ways to represent multiline strings, depending on how those strings are formatted. For instance, unquoted strings can simply be broken across multiple lines when prefixed with a >
:
long string: > This is a long string that spans multiple lines.
Note that using >
automatically appends a \n
at the end of the string. If you don’t want the trailing new line, then use >-
instead of >
.
If you use quoted strings, you need to preface each line break with a backslash:
long string: "This is a long string \ that spans multiple lines."
Note that any spaces after a line break are interpreted as YAML formatting, not as part of the string. This is why the space is inserted before the backslash in the example above. It ensures the words string
and that
don’t run together.
Beware of booleans
As hinted above, one of YAML’s other big gotchas is boolean values. There are so many ways to specify booleans in YAML that it is all too easy for an intended string to be interpreted as a boolean.
One notorious example of this is the two-digit country code problem. If your country is US
or UK
, fine. If your country is Norway, the country code for which is NO
, that is no longer a string—it’s a boolean that evaluates to false
!
Whenever possible, be deliberately explicit with both boolean values and shorter strings that might be misinterpreted as booleans. YAML’s shorthand prefix for booleans is !!bool
.
Watch out for multiple forms of octal
This is an out-of-the-way gotcha, but it can be troublesome. YAML 1.1 uses a different notation for octal numbers than YAML 1.2. In YAML 1.1, octal numbers look like 0777
. In YAML 1.2, that same octal becomes 0o777
. It’s much less ambiguous.
Kubernetes, one of the biggest users of YAML, uses YAML 1.1. If you use YAML with other applications that use version 1.2 of the spec, be extra-careful not to use the wrong octal notation. Since octal is generally used only for file permissions these days, it’s a corner case compared to other YAML gotchas. Still, YAML octal can bite you if you’re not careful.
Beware of executable YAML
Executable YAML? Yes. Many YAML libraries, such as PyYAML for Python, have allowed the execution of arbitrary commands when deserializing YAML. Amazingly, this isn’t a bug, but a capability YAML was designed to allow.
In PyYAML’s case, the default behavior for deserialization was eventually changed to support only a safe subset of YAML that doesn’t allow this sort of thing. The original behavior can be restored manually (see the above link for details on how to do this), but you should avoid using this feature if you can, and disable it by default if it isn’t already disabled.
Beware of inconsistencies when serializing and deserializing
Another potential issue with YAML is that different YAML-handling libraries across different programming languages sometimes generate different results.
Consider: If you have a YAML file that includes boolean values represented as true
and false
, and you re-serialize that to YAML using a different library that represents booleans as y
and n
or on
and off
, you could get unexpected results. Even if the code remains functionally the same, it could look totally different.
Don’t use YAML
The most general way to avoid problems with YAML? Don’t use it. Or at least, don't use it directly.
If you have to write YAML as part of a configuration process, it could be safer to write the code in JSON or native code (e.g., Python dictionaries), then serialize that to YAML. You’ll have more control over the types of objects, and you’ll be more comfortable using a language you already work with.
Failing that, you could use a linter such as yamllint to check for common YAML problems. For instance, you can forbid truthy values like YES
or off
, in favor of simply true
and false
, or to enforce string quoting.