DEV Community

Parsing Config Files The Right Way

Dimitri Merejkowsky on September 09, 2017

First published on my blog Parsing configuration files is something we programmers do everyday. But are you sure you're doing it the proper way? ...
Collapse
 
maxart2501 profile image
Massimo Artizzu • Edited
  • File is easier to read for humans. Compare: follows overly spaced JSON data

Well that's just your preference of writing a JSON file. I'd have written something like this:

{
  "auth": {
    "github": {
      "token": "ab642ef9zf"
    }
  }
}

More verbose, yes, but not that much. Node's console.log would have inlined it all.

  • Syntax is well-defined and all implementations behave the same

?!
Are you sure about that? That's quite a statement since YAML is a superset if JSON (really!), so any problem with parsing JSON is transmitted to YAML. Plus you have YAML's own syntax.

Moreover, YAML is extensible, meaning that it can be impossible to parse a YAML file that's been extended for another platform.

There's plenty of reasons why one could prefer YAML over JSON, and you explained quite some, but IMO these two aren't ones.

  • Whitespace is significant, so the file has to be properly indented.

While this sounds nice for readability, this also means that you need a validator to be fairly sure that your config file is ok, because if you mess up with the indentation the file is still considered valid YAML. If you miss a parenthesis in JSON or a closing tag in XML it wouldn't parse.

If you don't want any, the rule of the thumb is to avoid deeply nested YAML documents, which means this perk loses most of its meaning:

  • Elements can be arbitrary nested
Collapse
 
dmerejkowsky profile image
Dimitri Merejkowsky

Thanks for the feedback! Few remarks.

About JSON not being well-defined

See seriot.ch/parsing_json.php. True, most of the time you won't have any problem using different implementations of JSON parsers, but the devil is in the details. (Things like dates, text encoding, trailing comas or not-a-number floats).

YAML is a superset of JSON but its specification is more precise.

you need a validator to be fairly sure that your config file is ok,

Not sure what you mean by that. Personally, whenever I'm editing json, xml or yaml files, I have a linter that tells me if the syntax is OK.

If you mess up with the indentation the file is still considered valid YAML.

True. That's an argument that always comes back when you talk about whitespace significance. Python has the same problem, but personally I don't care that much. Fortunately, if this is an issue for you there are lots of alternatives.

avoid deeply nested (YAML) documents,

This is good advice and it applies to any configuration file ;)

Collapse
 
maxart2501 profile image
Massimo Artizzu • Edited

YAML is a superset of JSON but its specification is more precise.

I'm not sure I'm following you here. The format specification of JSON is flawless (as far as we know); the parser implementation specification, on the other hand, is left with more freedom as it's intended as an interoperable format and thus the result depends on the language that has to deal with it.

Now, I doubt YAML is re-defining JSON format spec, but maybe you're talking about the implentation?

Anyway, the page you posted, although it provides a lot of useful tests, it's all about edge cases. Now, it's probably very odd if you have an edge case in a configuration file.

Not sure what you mean by that. Personally, whenever I'm editing json, xml or yaml files, I have a linter that tells me if the syntax is OK.

I have to clarify indeed: I meant that you need a schema to validate your YAML. As an example, consider the following:

{
  "brands": {
    "BMW": [ "Z4" ],
    "Chevrolet": [ "Matiz" ],
  "Ferrari": [ "458" ]
  }
}

If I mess up the indentation, I still have a good JSON. If I do the same with YAML it's quite different:

brands:
  BMW: [ "Z4" ]
  Chevrolet: [ "Matiz" ]
Ferrari: [ "458" ]

A linter wouldn't catch it. That's why I suggest to keep config files as flat as possible :/

Thread Thread
 
dmerejkowsky profile image
Dimitri Merejkowsky • Edited

you're talking about the implementation?

Oh yes. Sorry.

If I mess up the indentation, I still have a good JSON. If I do the same with YAML it's quite different

Right. I see what you mean. Again, nothing new under the sun. You have exactly the same problem in Python.

Trivia: in a big Python project I was working on for quite a long time I only had a few bugs caused by incorrect indentation, but I can see why it's a big deal for lots of people :)

And the solution is the same: keep your code "flat" by using nice little helper functions.

Collapse
 
bgadrian profile image
Adrian B.G.

My suggestion is do NOT store tokens, auth, sensitive info in config files, even if they are in .gitignore. Switch to env variables, it's better from many perspectives.

Keep in your config files the things that do not change between your environments (staging, production, dev). See the 12factor for some Pro reasons and here are some against reasons

I think JSON is better for JS projects, the main reason: is simpler. You don't need a parser and simple is better.

  • it maps directly in your code
  • it's javascript
  • your team does not need to learn a new config schema
  • it's already widely used, ex meteor
Collapse
 
dmerejkowsky profile image
Dimitri Merejkowsky

All good advice, and the link about pros and cons about environment variables is very interesting.

But note that the context is a bit different: the 12 factor app is a software as a service, and my article is about a command-line tool.

Also, you're right, if you're using Javascript already, having config files in json (or even in javascript code!) is certainly a good idea.

Collapse
 
bgadrian profile image
Adrian B.G.

Sorry, my bad. I thought the app is nodejs web based.

When you start using hosting services, and your team is getting bigger, project grow, I always hit the config files problem.

You can also fix some of the "cons" of ENV vars by using a stub file with all the ENV vars, that can also be used as local DEV env. Some hostings also allows keeping ENV vars in files, but beats the purpose.

Collapse
 
rwhitt2049 profile image
Ry Whittington

Great tips. I'm a huge fan of Schema. For me, yaml's strength is that it is both more readable and writable.

Collapse
 
msoedov profile image
Alex Miasoiedov

I usually use yml loader + scheme validation lib github.com/tailhook/trafaret-config