Siddharth

Posted on Sep 22, 2021

Why not TOML?

#programming #toml

# this is some TOML

name = 'TOML'

[created-by]
name = 'Tom Preston-Werner'
dob = 1979-05-27T07:32:00-08:00

TOML is a configuration file format designed like an "improved" INI file.

At first, TOML looks like a simple, readable file format. If you use it infrequently on small, simple files, it's fine.

But as you use it more, you will start seeing the flaws of TOML.

Martin Vejnar, the creator of PyTOML, said the exact same thing. He initially built the parser out of enthusiasm for TOML, but eventually abandoned it:

TOML is a bad file format. It looks good at first glance, and for really really trivial things it is probably good. But once I started using it and the configuration schema became more complex, I found the syntax ugly and hard to read.

So, what's the problem with TOML?

1. It's verbose

This is a yaml document:

json:
  - rigid
  - better for data interchange
yaml: 
  - slim and flexible
  - better for configuration
object:
  key: value
  array:
      - null_value:
      - boolean: true
      - integer: 1
      - alias: &example aliases are like variables
      - alias: *example
paragraph: >
   Blank lines denote

   paragraph breaks
alias: &foo
  bar: baz
alias_reuse: *foo

And here's the same thing in TOML:

json = [
  "rigid",
  "better for data interchange"
]

yaml = [
  "slim and flexible",
  "better for configuration"
]

paragraph = """
Blank lines denote
paragraph breaks
"""

[object]
key = "value"

  [[object.array]]

  [[object.array]]
  boolean = true

  [[object.array]]
  integer = 1

  [[object.array]]
  alias = "aliases are like variables"

  [[object.array]]
  alias = "aliases are like variables"

[alias]
bar = "baz"

[alias_reuse]
bar = "baz"

The first YAML examples is 368 characters. The second is 455 characters. That's about 100 more characters you need to type. TOML re-introduces what human friendly languages are trying to get rid of: verbose syntax, the necessity to quote strings, and more.

The reasons are obvious:

When you have an array, you need to repeat the key again and again ([[object.array]])
There's a lot of syntactic extras like square brackets and quotes which dominate TOML.
There is no alias system.

Making programs smaller and DRYer significantly reduces the number of bugs. The same can apply for configuration files.

2. TOML looks like it was designed for parsers.

Hierarchy in TOML is determined by .s. This is simple enough for parsers to understand, but this makes it difficult for us.

That's why many people have adopted this style:

[markup]
  [markup.tableOfContents]
    endLevel = 8
    startLevel = 1
  [markup.highlight]
    style = "solarized-dark"

While this makes it easier to understand, it would be much better if we could get rid of the dots (and brackets) and just use indent alone. This is why I love the Python syntax.

There are still debates about whether using indentation alone was a good idea, but it generally is, as discussed in this StackExchange question

3. TOML has too many features.

TOML's creator criticizes YAML for having too many features and then does the same thing. Ironic.

For example, TOML has first class dates. If you have been programming for any amount of time, you know the problems associated with date and time (cough momentjs cough).

In my opinion there should only be 5 types: string, number, boolean, array, object. This is the approach JSON took, and it's a good decision.

4. Syntax typing

Take a look at this:

str = "string"
num = 42

TOML lets the users decide what type a thing is. But in most cases, this is not correct as the client should decide what type a thing should have. If the types are incorrect they should be coerced or an error should be thrown.

4. TOML's rules aren't that obvious

Most of TOML is obvious, but not all.

For example, the [[syntax]] is confusing. It looks like a [object] with an extra [] pair around it, but it's used to signify an array.

If you think I've missed any points (a lot!), or if you disagree with any points I've made, please leave a comment!

I actually have a lot more points to tell, and I'll write a longer article another day. Stay tuned!

If you like the post, follow me on Twitter or here on DEV for more awesome posts and tips!

Top comments (9)

Miron • Jun 9 '22 • Edited

Do you love Python syntax or do you love Python? For the latter you better start to love TOML to as it isn't going anywere, pyproject.toml becoming the standard (PEP518, PEP517, PEP621, PEP660)
Your post is obviously a counterpost to PEP518, oposing everything from those core developers. Indentation for config files doesn't work for me, and I would never format toml files like you did. Did you ever wrote a github workflow file? I always struggle with its 6+ levels of indentation and 'hyphenated' list items. Martin Vejnar proposed CSON, just reading through the Python implementation how it treats whitespace different than CoffeScript made me think no, thank you :) If I need an IDE to write config files, then it is probably not the right format...

Jay Jeckel • Sep 22 '21

INI was an amazing concept that does its job very well, being easy to read and easy to parse. TOML is not a good INI extension for many reasons, some you've listed, but YAML isn't any better of a format. The main difference is that I've never had to use TOML, while, for some reason, YAML has been shoved into all kinds of places it doesn't belong.

For one thing, strings should be quoted; this TOML and JSON get right, and YAML gets wrong. For another, significant white space may work for Python, but that doesn't mean it's a good thing and certainly doesn't justify its use in a data file format; this goes doubly so when the - is enough to do the job anyway.

TOML lets the users decide what type a thing is. But in most cases, this is not correct as the client should decide what type a thing should have. If the types are incorrect they should be coerced or an error should be thrown.

I couldn't disagree with this more. Types should be clear and unambiguous, and the correct type should be chosen by the person entering the data, not by the client, parser, or any other software. If the software is having coerce types, then you're doing something wrong. But, to be fair, this is basically the dynamic typing vs static typing debate in a different package, so I'm sure opinions will differ.

In the end, given the choice, I'd take old school INI over TOML or YAML any day of the week.

Anyway, great article and I look forward to reading your future articles.

Siddharth • Sep 23 '21

I agree to almost all your points from a programming point of view. But config files are supposed to be human readable, not just programmer readable. I guess I didn't fully mention that in my post, that's why I said I'll be making a better one someday.

TOML is not a good INI extension for many reasons, some you've listed, but YAML isn't any better of a format.

Absolutely. In fact, I was planning to make that my next post.

For one thing, strings should be quoted; this TOML and JSON get right, and YAML gets wrong. For another, significant white space may work for Python, but that doesn't mean it's a good thing and certainly doesn't justify its use in a data file format; this goes doubly so when the - is enough to do the job anyway.

Configuration files are supposed to be human readable and writable, not just programmer readable and writable. Programmers may feel at home knowning the difference between "1" and 1, but other (non-programmer) humans find it confusing. This is where type coercion comes into play.

Also, humans find indents more helpful than braces (that's why we indent code!)

TOML lets the users decide what type a thing is. But in most cases, this is not correct as the client should decide what type a thing should have. If the types are incorrect they should be coerced or an error should be thrown.

I couldn't disagree with this more. Types should be clear and unambiguous, and the correct type should be chosen by the person entering the data, not by the client, parser, or any other software. If the software is having coerce types, then you're doing something wrong. But, to be fair, this is basically the dynamic typing vs static typing debate in a different package, so I'm sure opinions will differ.

Today you write "42" as a value, your app receives a string and everything goes well. Tomorrow you write 42 and the app crashes because the parser decided that the value 42 was a number. Of course, we could defensively program to avoid this, but having a schema as a source of truth is the best way to go.

In the end, given the choice, I'd take old school INI over TOML or YAML any day of the week.

Me too. In fact, I'm designing a simple INI like file format. It may never be complete, but I'll see.

I love it when people post comments like this on my blog posts. Thanks.

Jay Jeckel • Sep 23 '21

I think we as programmers underestimate the ability of non-programmers and programmer-adjacent people. Normal people understand numbers and they understand that "42" is different from 42. Personally I've never had a problem teaching dentists, doctors, nurses, and receptionists that text should have quotes around it when working with configuration and data files.

More over, quoting strings seems like a weird place to draw the line. These normal users already have to learn that some things require colons after them, things below that have to be indented two spaces with a dash, what aliases are, and how & and * are used. I don't see any reason that quoting text is the thing that makes it too complicated for them. Especially when more and more schools are teaching basic programming skills at younger and younger ages.

Today you write "42" as a value, your app receives a string and everything goes well. Tomorrow you write 42 and the app crashes because the parser decided that the value 42 was a number. Of course, we could defensively program to avoid this, but having a schema as a source of truth is the best way to go.

Totally agree that schemas are the way to go, but when some user changed "42" to 42, the parser shouldn't try to fix the human's mistake, it should simply alert the user that they made a mistake by forgetting the quotes. Again, static typing vs dynamic typing; even for non-programmers it is better when things work like a deterministic machine and not like a magic hat.

Also, humans find indents more helpful than braces (that's why we indent code!)

I disagree and think braces with proper indention is the best option for programmers and programming languages.

Regardless, my point was simply that YAML specifically fails by requiring both indentation and the dash character before items when the dash character should have been enough on its own.

Me too. In fact, I'm designing a simple INI like file format. It may never be complete, but I'll see.

That's awesome! Making a standard, parser, and editor for a backwards compatible extended INI format was one of the first projects I took on as a young programmer and it was a lot of fun.

I love it when people post comments like this on my blog posts. Thanks.

No problem. I love articles like this and thank you for writing it and responding to my comment. :)

Siddharth • Sep 23 '21

I think we as programmers underestimate the ability of non-programmers and programmer-adjacent people. Normal people understand numbers and they understand that "42" is different from 42. Personally I've never had a problem teaching dentists, doctors, nurses, and receptionists that text should have quotes around it when working with configuration and data files.

That's definitely true. But most people will never had heard of strings, or TOML, or INI, and might not have been taught by a programmer. We must consider them too.

More over, quoting strings seems like a weird place to draw the line. These normal users already have to learn that some things require colons after them, things below that have to be indented two spaces with a dash, what aliases are, and how & and * are used. I don't see any reason that quoting text is the thing that makes it too complicated for them. Especially when more and more schools are teaching basic programming skills at younger and younger ages.

Actually I was talking about config files in general, not just about YAML. My post has nothing to do with YAML, it just shows it as an example to compare to.

I know that YAML isn't perfect. I know that the two spaces before dash thing is weird (don't ask me how many times my GitHub actions broke before that), and I know how the & and * and | and > are confusing.

Jay Jeckel • Sep 23 '21

Actually I was talking about config files in general, not just about YAML. My post has nothing to do with YAML, it just shows it as an example to compare to.

Apologies, I was also using YAML as a generic example. The same points hold true for any file format; users already have to learn various syntax and semantics of the format, also learning to quote strings isn't much of a stumbling block and brings more benefits than it does disadvantages.

That's definitely true. But most people will never had heard of strings, or TOML, or INI, and might not have been taught by a programmer. We must consider them too.

Agreed, however there has to be a line draw somewhere. We should assume that anyone allowed to edit the config file has at least been taught (or learned) the format of the file, whatever that format is. After all, in the real world it isn't random people off the street editing config and data files, it is most often trained employees and/or learned users. So by the time they get to the file they should have heard of sections and entries, numbers and strings, arrays and objects, and/or whatever abstractions and constructs the file format requires.

To paraphrase something an old C++ teacher told me back in the day, "Users aren't unintelligent, they are ignorant and that means they can learn." In other words, a user may not know the difference between a number and string, but when the computer yells at them for giving a number instead of a quoted string, they are more than capable of learning the difference. By trying to be forgiving and guessing what they really wanted, one isn't so much helping the user as depriving them of an opportunity to learn and improve.

Siddharth • Sep 24 '21

Actually I was talking about config files in general, not just about YAML. My post has nothing to do with YAML, it just shows it as an example to compare to.

Apologies, I was also using YAML as a generic example. The same points hold true for any file format; users already have to learn various syntax and semantics of the format, also learning to quote strings isn't much of a stumbling block and brings more benefits than it does disadvantages.

Yes, and the point of human friendly languages is to try and minimize whatever needs to be learned. Like you said, quoting and unquoting can be easily learned. But if we remove that it becomes (albeit slightly) easier for users.

Agreed, however there has to be a line draw somewhere.

Definitely, we have to. I was just mentioning the others.

By trying to be forgiving and guessing what they really wanted, one isn't so much helping the user as depriving them of an opportunity to learn and improve.

Makes sense.

Posandu • Sep 22 '21

Is this a test post ? XD

Siddharth • Sep 22 '21

No 🤣