loading...
Cover image for Introduction to YAML

Introduction to YAML

paulasantamaria profile image Paula Santamaría Updated on ・5 min read

The first time I came across YAML was around a year ago when I use it to write OpenAPI definitions to document a RESTful API using Swagger API Documentation and, to be honest, I really hated it.

Being a JSON "fan", the YAML syntax felt weird and unnatural to me, so for a while, I didn't pay any attention to it.

This changed a few months ago, when I started to get into CI/CD, since both Azure and GitLab pipelines require a YAML file to setup. So I finally decided to properly learn about YAML, and after doing some reading I found the ideas behind it fascinating.

In this article I'll cover the basics of YAML, including its main goals, basic syntax and some of its more complex features.

Table of contents

Introduction

YAML is a data-serialization language often used for configuration files, such as Open API specifications or CI/CD pipelines.

Fun fact! 🤓

According to YAML 1.0 specification document (2001-05-26) the acronym "YAML" stands for "Yet Another Markup Language", but it was later changed to the recursive acronym "YAML Ain't Markup Language" in the 2002-04-07 specification.

As stated in the latest spec YAML is designed to be friendly to people working with data and achieves "unique cleanness" by minimizing the use of structural characters, allowing the data to appear in a natural and meaningful way.

The latest spec also states that YAML 1.2 is in compliance with JSON as an official subset, meaning that most JSON documents can be parsed to YAML.

YAML achieves easy inspection of data's structures by using indentation-based scoping (similar to Python).

Another fun fact! 🤓

DEV.to articles use YAML to define custom variables like title, description, tags, etc.

Basic Syntax

YAML documents are basically a collection of key-value pairs where the value can be as simple as a string or as complex as a tree.
Here are a few notes about YAML syntax:

  • Indentation is used to denote structure. Tabs are not allowed and the amount of whitespace doesn't matter as long as the child node is more indented than the parent.
  • UTF-8, UTF-16 and UTF-32 encodings are allowed.

Strings

# Strings don't require quotes:
title: Introduction to YAML

# But you can still use them:
title-w-quotes: 'Introduction to YAML'

# Multiline strings start with |
execute: |
    npm ci
    npm build
    npm test

The above code will translate to JSON as:

{
    "title": "Introduction to YAML",
    "title-w-quotes": "Introduction to YAML",
    "execute": "npm ci\nnpm build\nnpm test\n"
}

Numbers

# Integers:
age: 29

# Float:
price: 15.99

# Scientific notation:
population: 2.89e+6

The above code will translate to JSON as:

{
    "age": 29,
    "price": 15.99,
    "population": 2890000
}

Boolean

# Boolean values can be written in different ways:
published: false
published: False
published: FALSE

All of the above will translate to JSON as:

{
    "published": false
}

Null values

# Null can be represented by simply not setting a value:
null-value: 

# Or more explicitly:
null-value: null
null-value: NULL
null-value: Null

All of the above will translate to JSON as:

{
    "null-value": null
}

Dates & timestamps

ISO-Formatted dates can be used, like so:

date: 2002-12-14
canonical: 2001-12-15T02:59:43.1Z
iso8601: 2001-12-14t21:59:43.10-05:00
spaced: 2001-12-14 21:59:43.10 -5

Sequences

Sequences allow us to define lists in YAML:

# A list of numbers using hyphens:
numbers:
    - one
    - two
    - three

# The inline version:
numbers: [ one, two, three ]

Both of the above sequences will parse to JSON as:

{
    "numbers": [
        "one",
        "two",
        "three"
    ]
}

Nested values

We can use all of the above types to create an object with nested values, like so:

# Nineteen eighty four novel data.
nineteen-eighty-four:
    author: George Orwell
    published-at: 1949-06-08
    page-count: 328
    description: |
        A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell.
        It was published in June 1949 by Secker & Warburg as Orwell's ninth and final book.

Which will translate to JSON as:

{
    "nineteen-eighty-four": {
        "author": "George Orwell",
        "published-at": "1949-06-08T00:00:00.000Z",
        "page-count": 328,
        "description": "A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell.\nIt was published in June 1949 by Secker & Warburg as Orwell's ninth and final book.\n"
    }
}

List of objects

Combining sequences and nested values together we can create a lists of objects.

# Let's list books:
- nineteen-eighty-four:
    author: George Orwell
    published-at: 1949-06-08
    page-count: 328
    description: |
        A Novel, often published as 1984, is a dystopian novel by English novelist George Orwell.

- the-hobbit:
    author: J. R. R. Tolkien
    published-at: 1937-09-21
    page-count: 310
    description: | 
        The Hobbit, or There and Back Again is a children's fantasy novel by English author J. R. R. Tolkien.

Distinctive Features

The following are some more complex features that caught my attention and that also differentiate YAML from JSON.

Comments

As you've probably already noticed in my prior examples, YAML allows comments starting with #.

# This is a really useful comment.

Reusability with Node Anchors

Node anchors mark a node for future reference, which allow us to reuse the node. To mark a node we use the & character, and to reference it we use *:

In the following example we'll define a list of books and reuse the author data, so we only have to define it once:

# The author data:
author: &gOrwell 
    name: George
    last-name: Orwell

# Some books:
books: 
    - 1984:
        author: *gOrwell 
    - animal-farm:
        author: *gOrwell

The above code will look like this once parsed to JSON:

{
    "author": {
        "name": "George",
        "last-name": "Orwell"
    },
    "books": [
        {
            "1984": {
                "author": {
                    "name": "George",
                    "last-name": "Orwell"
                }
            }
        },
        {
            "animal-farm": {
                "author": {
                    "name": "George",
                    "last-name": "Orwell"
                }
            }
        }
    ]
}

Explicit data types with tags

As we've seen in previous examples, YAML autodetects the type of our values, but it's possible to specify which type we want.
We specify the type by including it before the value preceded by !!.

Here are some examples:

# The following value should be an int, no matter what:
should-be-int: !!int 3.2

# Parse any value to string:
should-be-string: !!str 30.25

# I need the next value to be boolean:
should-be-boolean: !!bool yes

This will translate to JSON as:

{
    "should-be-int": 3,
    "should-be-string": "30.25",
    "should-be-boolean": true
}

Conclusion

Reading and writing about YAML, and experimenting with it was super interesting.

What I like: I specially loved to read about the goals of YAML in relation to code cleanness and readability, and how it achieves that. I also feel better about properly learning the syntax at last 😅.

What I don't like: I don't like that I need a parser (which means installing a new dependency) to use YAML with the main technologies I work with (node.js and .NET Core).

However, I will now consider YAML, specially if I need something that JSON can't cover like reusability, explicit types or comments. I'm sure that working with pipelines will be easier now too.

Also, I'd strongly recommend reading YAML 1.2 Specification document (3rd review) - Introduction to learn more about YAML goals, origins and relationship with other languages.

What are you using YAML for? 💬

Are you using YAML? For what? What are your thoughts about it?

Posted on by:

paulasantamaria profile

Paula Santamaría

@paulasantamaria

Passionate about creating stuff. Gamer, digital artist and guitarist on my free time.

Discussion

pic
Editor guide
 

I used YAML file to configure Cluster group in the Pipeline during my internship this past Summer. It was a bit challenging since this was the first time using it but definitely easy to work with. Thanks for sharing this great article.

 

Nice! Was it your first time working with it? And did its readability made it easier to pickup?

 

Thanks for all the tips and tricks. I like using YAML for configurations.

PS: YAML has some default casting one should be aware of:

In [4]: yaml.load('yes') # 'Yes' and 'No' become boolean
Out[4]: True

In [5]: yaml.load('1_000_000')
Out[5]: 1000000

 

Thank you! According to the YAML 1.2 specification document 'yes' and 'no' are no longer interpreted as boolean.

We have removed unique implicit typing rules and have updated these rules to align them with JSON's productions. In this version of YAML, boolean values may be serialized as “true” or “false”;

You can use !!bool to parse them, though.

 

Thanks for this! I just got started with GitHub Actions a couple of days ago, and was making a lot of assumptions on what the YAML was representing -- the translations to JSON you've done here are really helpful :)

 

Thanks Darren, I'm glad I could help! 🙂

 

I use YAML for my default config files with tools such as ESLint and Stylelint. I find the syntax a lot more intuitive than JSON and I am less likely to make mistakes with it.

Thank you for making this tutorial, Paula.

 

Nice! Do you mind sharing a bit more about how you use it? Like which language and do you use a parser?
I'm really curious because I love the syntax, but I don't like the idea of including extra dependencies just for that.

 

I currently use yaml only when a node js package offers built-in support for it. I stick with json, otherwise. I don't know much about parsers as I've only used babel before and it doesn't support yaml, as far as I know.

 

I use YAML so often on Jekyll.. and I didn't know I could write multiline strings just by add this |. Amazing. Thanks.

 

Something I forgot to include about strings is that you can also write multiline strings that you don't want to be interpreted as multiline. For example:

single-line-string: > 
    This
    should
    be
    one
    line

And this is how it'll look like in JSON:

{
    "single-line-string": "This should be one line\n"
}

When using the > character, instead of |, each new line will be interpreted as an empty space.

 

I'm glad! Thanks for reading :D

 

I used YAML for my uni assignment where we had to build a ci/cd pipeline using Travis CI for a spring boot application. Travis uses a YAML file for configuration and I found it very easy and intuitive to work with.

 

Nice! Every CI pipeline config I've seen so far uses YAML. I believe we'll be seeing more of it in the near future.

 

Hi, Paula. Great article!
I have one question, in JSON we can easily create a list of objects without “naming” them, like this:
[{“a”: “b”}, {“x”: “y”}]
How we do that in YAML? For example, a list of Authors (I don’t wanna have [{“authors” : {authoObj1}}, {“authors” : {authoObj2}}])

 

Great question!
You can achieve that by entering the hyphen first and then the properties in a new line, like so:

- 
    name: George
    last-name: Orwell
- 
    name: Stephen
    last-name: King

Which will translate to JSON as:

[
    {
        "name": "George",
        "last-name": "Orwell"
    },
    {
        "name": "Stephen",
        "last-name": "King"
    }
]

Also here's a nice online tool I've been using to try the YAML syntax and see its JSON counterpart.

 

I use it since Jekyll themes use it as configuration data(front matter). As you described, at first glance it seemed weird but once you spend some time, you get used to it.

 

Yes! Once I found out I could use comments and felt confortable enough to structure the info however I liked I realized how easy it was.

 

Yay thanks for that post, I've been using it slightly but happy to have learned something on it!

 

Great! I'm glad I could help :)

 

I use yaml for Puppet and Ansible

 

I didn't know any of those so I had to do a quick search. Sounds interesting!

 

I start using YAML in docker and from that time I find it so useful and simple it makes me feel I read an article, not a data file.
Thanks Paula for this amazing article.

 

Docker Compose to be precise ✌️✌️✌️

 

Thank yo Mahmoud!
I've been meaning to get into docker, maybe after learning about YAML it'll be easier.

 

I ran into YAML when working with AWS CloudFormation. It beat me up like a drum 😒

 

Believe me, I know the feeling!😂
One of the things that made me change my mind after reading about it for a bit was that the whole concept behind YAML made me remember how I felt after going from XML to JSON, which was basically "wow, so much less code, I can actually read this!".

 

I've been using YAML for the last 2 years and I've completely ignored learning anything about it. This was a good beginning.

Thanks

 

Same here 😂. Its focus in readability makes it easy to pickup without actually learning it, but I always felt a bit uncomfortable with it until I wrote this, I also hated having to google specific stuff like "multi-line strings in yaml".

 
 

Hey yamlonline.com/
checkout this editor for YAML to JSON
try it!!

 

So YAML does for JSON what Markdown did for HTML.

But I'm no fan of YAMLs "no tabs" rule or the "as many spaces as you want" rule. Seems a odd choice backed by a flimsy excuse.

 

I found those a bit odd too at first. However, the YAML specification addresses the no tab rule:

Note that most modern editors may be configured so that pressing the tab key results in the insertion of an appropriate number of spaces.

And I found that to be true for VSCode at least.

And about the number of spaces, I just handle that myself and keep it consistent. Haven't had an issue with that, to be honest.

 

Hey checkout this yamlonline.com/
It's a nice tools for YAML to JSON try it!!