loading...

Stop Using YAML

jessekphillips profile image Jesse Phillips ・2 min read

Preamble

So what I realized is this argument is basically the same as this post.

YAML

"YAML is a human friendly data serialization standard for all programming languages."

It has been around since 2001 and recently it is grown in popularity. I'd like to curb this growth.

I don't have a full history of data serialization, but here are some points.

Binary serialization - Some early serialization was a dump of memory to disk. Today we hear cross-platform we think OS, hardware stores data differently (the big one being endian). As such this was not a good strategy for sharing.

Human Readable - XML was a big deal. It could be read and modified by plain text editors and it was based on something familiar SGML HTML. It was expressive enough that you could represent anything with it, and we did.

Schema Free - XML was massive and emphasized using schemas even though most parsers didn't require it. This was a pain for the fast moving web which didn't want to be brought down by the shackles of standards. JSON didn't need a schema and the web language could already parse it. (some people are still trying to get schemas into JSON.

Human Friendly - Sure HTTP, XML, and JSON can be read by humans, but they aren't very friendly to us mortals. JSON got rid of the tags, but left much in the way with weird punctuation. Enter Markdown YAML a language which looks like a story written by Shakespeare but machine readable too.

Problem is, Shakespeare wrote in old English. The things he has written look similar but can have a different meaning and thus the time period need to be studied to actually know the meaning.

YAML wants you to feel at home, but if you're using it as a poor man's batch script (looking at you gitlab) then you're going to use things YAML assigns a special meaning to.

You're going to tell me that I should have the batch script in a file and run that instead. Well that is what pushed me over because that is what I was trying to do. Powershell uses & for file execution and YAML uses it for anchors. So now I need to quote the entire command to make it valid.

JSON requires escape inside strings, but this is well defined as you know where it expects structure. YAML trying to be friendly requires extra structure when there is ambiguity.

On top of that being hard for humans & it being valid writing, YAML is hard to parse. Whitespace has meaning, except if you are writing a list then you have a couple of choices. And while references are supported, how well does that work for you when you need to support JSON reference conversation (looking at you OpenAPI)?

Posted on by:

jessekphillips profile

Jesse Phillips

@jessekphillips

Senior Quality Assurance (SDET) ¶ Avid hobby D programmer ¶ Telling people what to do because I am right.

Discussion

pic
Editor guide
 

Something that is great in YAML is that you can add comments. Also, you need to follow indentation rules (what is great to avoid other people to mess the file). All these things made YAML a great way to serialize configurations.

JSON is not friendly or easy to follow and this can drive you to errors...

Is YAML perfect? No. Is YAML a great option when human intervention is needed? Yes.

 

I'd like to hear your counter argument to the issues I raised, because I'm arguing it isn't going when human intervention is needed.

Json isn't friendly, which is why I put it in "readable". The trouble is you need something with clear rules, because when the human writes it wrong, they'll neet to read it under the eyes of a parser, and json is simple for that.

 

Well, you can write also a YAML schema. You can also write something wrong in JSON, right? And what is going to correct you is the schema. So, you can have the best of both worlds

 

YAML is indeed irregular and hard to consistently parse.

It also suffers from the same problems most configuration languages have: in particular it's repetitive and non programmable. This often leads to writing config generators which in turn introduce generation failures

I think Dhall gets all of this right. Its syntax is regular. It's programmable, but not total. It can't error out or loop infinitely giving it all the reliability of a static config file without the repetitiveness.

dhall-lang.org/

 

Any parser implementing the specification will consistently parse. What kind of inconsistencies are you talking about?

 

This link summarises some of them:

github.com/cblp/yaml-sucks

The problem is the language specification is poorly defined and full of irregularity and corner cases leading to inconsistent parsers.

Much of the problem comes from YAML allowing unquoted strings giving them nothing to clearly distinguish them from other data types. This leaves parsers in the awkward situation of needing to decide through complicated rules whether a token is a string or something else.

 

That reminds me of Lua. It isn't turning complete but if it has functions I am curious how serialization works with that.

As a write only config I'd recommend Lua.

 

I've never used Lua. But I get the impression the goals are a bit different in that Lua aims to be a programming language for embedding into an application and providing a scripting interface for it. As such I think it's probably a bit heavyweight for a simple configuration language. I expect that using it as a configuration language introduces the same problems that config generation introduces in that it can crash or hang which are usually not properties desirable for a configuration language. But, being an embedded language, it does at least mean your application can load it directly instead of config generation needing to be an intermediate step.

This table summarises what I think are some desirable traits in a configuration language, and how well various languages support them:

Language Feature Binary INI JSON YAML Lua Dhall Config Gen
Human Readable No Yes Yes Yes Yes Yes Yes
Regular. Simple Parse Rules No Yes Yes No Yes Yes Yes
Variable Bindings No No No Sort of (anchors) Yes Yes Yes
Imports No No No No Yes Yes Yes
Functions No No No No Yes Yes Yes
Never Crashes (when well-formed) Yes Yes Yes Yes No Yes No
Never Hangs (when well-formed) Yes Yes Yes Yes No Yes No
Readily Readable by multiple Languages No Yes Yes Yes Yes Getting There Yes
Directly readable by the application Yes Yes Yes Yes Yes Yes No
Validation Support No No Yes (Schema) Yes (Schema) No Yes (strong, static types) No

Agreed. Lua has many problems as pure configuration. But it does have ancestry influence to be a configuration language.

I guess I just think there could have been some work on perfecting what was doing, like a limited subset LSTN (Lua Simple Table Notation).

BTW, I hate Lua as a language.

 

I would encourage to avoid telling people what to do. Perhaps say "I don't like YAML, here is why" but trying to put a stop to it, just because it's not for you, seems a bit egotistical. I personally am not a fan of .NET but never would I say "Don't use it!". New developers may come to your article and take what you're saying as truth, and it could hurt them in the long run. It's best to provide examples of why you don't like something, and let the persons decide for themselves.

 

Exactly. If he doesn't like it, don't use it or anything that requires it. Don't tell others what to do. Json for example has its own quirks and is super hard to read for a human, so use it for human config is just crazy.

 

If YAML is being human typed, it seems like mistakes (like the & for powershell) would be fairly obvious thanks to syntax highlighting. Many languages have built in keywords/symbols, and users can avoid using keywords as variable names because they are highlighted as soon as they are typed. I use this method especially when using a language I'm not familiar with.

The same argument could be made for PowerShell itself. If I am using cd and the path includes a & (or one of several other special characters) quotes are needed, just like YAML. Maybe you would also criticize PowerShell for this as well, I'm not sure.

If you're ever concerned about ambiguity, just always use quotes. I would argue that English is the root of ambiguity, and whenever a language is made Grandma-readable it necessarily must introduce ambiguity.

It seems a little unfair to say to stop using YAML, but not offer a clear alternative.

I don't think YAML is without fault, but many issues like machine editablity (while preserving comments) were not brought up.

I'm not sure what you mean about whitespace only lacking meaning inside of lists. If you could explain that further.

I'm also not sure what you mean about the JSON reference conversation. If you could explain that a bit as well.

 

If you're ever concerned about ambiguity, just always use quotes.

But that is exactly what I hate. Now I must arbitrarily give advice to always quote or learn the many ways yaml will bite you.

Lists can either be indented, or reside at the same level as their parent.

People use "$ref" : by convention in json in order to use references.

I did not make an alternate recommendation because their are so many out there. Yes having to learn 20 configuration languages over yaml probably isn't worth it

You mentioned machine readable (preserving comments). I don't see this usage. In fact I don't think I've seen it where the machine writes yaml, only humans.

 

But that is exactly what I hate.

Do you hate this about PowerShell and Bash as well?

I don't think I've seen it where the machine writes yaml

That's because it doesn't work well. References are processed on load/import and that breaks the export, comments are destroyed, the style (quoted or unquoted) is ignored, etc. PyYaml is the only one with debatable support for comment preservation and it has plenty of problems. This is one of the weaknesses of YAML, and why it is a problematic replacement for many setup config files like the package.json.

Thanks for the clarifications, they make sense now.

However, I'm now confused why you would title this "Stop Using YAML" if you still think YAML is better than learning 20 different configuration languages.

Yaml advertising itself as human friendly, then having the same complexity as bash... Bash and powershell don't do that.

It is interesting that you list problems with yaml others are saying it does well... Hmmm...

Maybe another comment has mentioned this, but you might like StrictYAML. It is a subset of YAML designed specifically to weed out all ambiguity. It might even be the solution you want to suggest as an alternative to YAML.

I can understand where you're coming from with Bash and PowerShell. Learning by mistakes was painful and very confusing in Bash, and I honestly think shell languages need a complete redesign. Funny enough is I actually made an experimental shell to address this by having it accept all arguments in the form of YAML to reduce ambiguity.

I love YAML, but I don't want to pretend that it doesn't have serious flaws. Which is why I mention the problems that it has. I really want it to completely replace JSON, but it is going to need improvements before that is realistic.

For me personally, if it is humans writing YAML with syntax highlighting enabled, I think YAML is the gold standard for config files.

Hi i am working on the StackStorm open source project to automate some dev opsy things and the workflows are built in YAML. It is himan readable but brittle. But have been wondering if its the best way to define workflows which may be more dynamic and can be better expressed differnetly, such as ith bitwise operators.

So am interested in alternatives without throwing out the baby wih the bathwater. So curious about strictYAML.

But since a workflow ideally has programmability and graph attributes, what other alts are there?

I don't have vast experience with different alternatives. I think one of the big challenges often faced is you want something readily available in multiple languages.

I really like Lua. Not as a programming language, but the lightweight embedded part. They keeped the syntax light, but don't go overboard like yaml.

 

The rise of yaml is the rise of golang. The tools we know yaml from - the dockers, the kubernetes, the so on and so forths - are largely written in golang.

In go world, yaml is trivial to parse. By trivial, I mean: no trouble whatsoever. Go's struct tags, along with marshal/unmarshal, made config parsing a non-problem. In a world where parsing concerns are non-existent on the code side, we naturally err towards DX on the human side.

This doesn't necessarily hold true in Node, Python, or other languages, but - again - yaml's rise is go's rise.


Edit: I processed this little bit

Powershell uses & for file execution and YAML uses it for anchors

This hints more at established languages like C and Go. Ampersand is the traditional syntax for a pointer. Generally speaking, the people who write lang specs aren't going to give too much weight to what powershell does.


Edit: some additions.

  • With Go: every time, I use yaml (for legibility)
  • With Node: Every time, I use JSON (for support).
  • With interop between the two: I lean towards json.
  • In all cases: I wish json was more human-friendly.
 

Xml is trivial to parse, and by that I mean someone wrote a lex and parsers so you don't have to.

The majority of my argument was related to the human need to understand the intricacies of the language, but you only focus on the machine aspect in your rebuttal, why?

 

We play a fine line between person and machine. More often than not, the machines win.

If anything, I played the middle ground - the "humans and machines can get along now" side of things.

Really, though, it's purely objective:

  • the tools that have made yaml ubiquitous are written in Go.
  • Go has struct tags and marshaling.
  • Why marshal to json when you can marshal to yaml with zero effort?
  • such is yaml.

When go dies, yaml will die.
Maybe?

Not only Go is a friend of YAML. It happens that Python syntax shares several similarities with YAML, and using YAML is very natural to a Python programmer. Go may die, YAML will survive add long as Python does ;-).

 

I think you are being harsh with YAML. There are tools/libraries that generate perfectly valid YAML and linters (and formatters) in every popular text editor to help you understand what you are writing and correct you if you are wrong. If you forget to quote a & character, this is the equivalent of a typo and you make them in every (well or not well) defined language. Read the manual, learn the rules and you will be fine.

 

Read the manual, learn the rules and you will be fine.

Isn't that exactly my point. Did you look into my reference to the uncanny valley?

 

I didn't get that point from your post. Which language have you used without reading the rules first? If you don't, you will make mistakes sooner or later. The fact that you don't like some of these rules, is not a valid reason to go on a conquest for people to stop using it. Personally, human readability is a totally legit reason for some people, even if it doesn't tick any other box. It's the same reason why some languages enforce indentation and some others don't, some use parentheses and colons and some other don't and people choose one or the other. It's called flavor and in my mind, the more flavors the better.

I'll check the reference, thanks!

 

Agreed, white space sensitivity reminds me of the old school make file. It drives me to crazy. I have to use YAML to JSON online conversation tool every time to verify the correctness. Such a bad idea. And worse, unnecessary flexibility just gives you full of surprises. For example: if you have a key "on", it will surprisingly convert it to true. I have wasted so much time troubleshooting these kinds of weird problems.

 

Like any language it will have it's idiosyncrasies, my recommendation would be to learn them and internalize them, and then they won't distract you as much, they'll stop being a pain point.

That said, "the right tool for the job" rule suggests that calling out to a shell script might be the right thing for anything more complex and a simple command.

Significant whitespace and ampersands-for-macros doesn't a bad language make.

 

I feel like you didn't internalize my arguments. Calling out to a shell script for more complex things is exactly what I was doing.

Needing to internalize all of yaml in order to use it was kind of my point. YAML looks easy and human friendly, but rather than learn a few things which are common across most any language (strings use quotes and thus quotes need escaped, and thus escaping needs escape). No you need a to know a very special language. So why not instead internalize a simple, but annoying language?

 

YAML is good for simple use, like Windows INI. It is good for data collection from many sources. It is good for simple append data into file.

But I prefer XML for complex hierarchical data stucture. XML schema is good, but schema free wellformed XML is good enought. Elements, attributes, entities, comments, processing instructions, charsets, tools, ... all you need is XML.

 
 

Have not really used it,other than gitlab config.

Reading its spec I'd say it probably isn't as problematic. Though it may have different issues since the same data structure can be defined multiple ways and is invalid.

 

Don't you think that it's not the tool's problem but the way how it's used?

 

Yes, I'm calling out two uses people should reconcider. I'm giving explanation of the issues so that others don't make the same mistake.

Now I'd argue is that YAML provides too much and draws people to use it inappropriately. It makes me wonder what an appropriate usage is.

 

Well, any usage that solves your current problems is appropriate one. But once it bumps into problems, it could mean two problems: the tool is used in wrong way, the tool itself is wrong. Mostly, it's a wrong way use. IMO, at this time is better to learn the tool deeper rather than switching to some other tool. It could be great to see some example drives you mad. It could clarify things, from my experience I've never had so big YAML files that's hard to parse. And from my experience working with Rails even very big files (translations, usually) don't bring any problems and irritating. Yes, sometimes it's hard to follow indentation. But with JSON it's a hard to follow parentheses, and indentation as well, if it's for humans. So I would say JSON is even worse.

 

YAML is precisely defined: go to yaml.org and RTFM.

(BTW, XML is not "based" on HTML. It is a subset of SGML.)

 

ReRead what I wrote.

 

If you know the spec, your should never be "surprised".

And, I don't know how it could be "hard to parse". There are parser libraries for every language and they work perfectly well.

Maybe the sole fault of YAML is to look so simple that people think they don't need to read the spec. Experienced IT'ers do not fall in this trap.

 
 

Sounds like the real problem is you're using powershell.