DEV Community

yaythomas for pypyr

Posted on • Updated on

Comparison of Python TOML parser libraries

Update February 2022: Since the original article was written below, PEP 680 to add TOML Parsing in the Standard Library was approved.
So tomli, as recommended below, will actually be included in 3.11.

If, like most of us, you're going to have to support older versions of Python for a few years yet, read on. . . this article talks you through your options!

The pypyr automation pipeline task-runner open-source project recently added TOML parsing & writing functionality as a core feature. To this end, I researched the available free & open-source Python TOML parser libraries to figure out which option to use.

If you're interested in a walkthrough of the decision making process, this is documented in an Architecture Decision Record (ADR) you can read here: adr003 toml in the pypyr core

Hopefully sharing my notes helps someone else going through this process to save some time...

TOML is becoming a popular configuration format, and unavoidably so in Python in the shape of pyproject.toml as accepted by PEP518.

Since "best" is a loose term best left to click-bait headlines (hoho, see what I did there? 😉), instead of asking "which TOML parser is the best?" the more sensible question to answer is which TOML library more suits the requirements of your project with the least negatives?

Python TOML libraries at a glance

  • tomli
    • Relatively fast read-only parsing.
    • Companion library tomli-w for writing.
  • tomlkit
    • Round-trip white-space/style preserving.
  • toml
    • This was initially vendored in pip itself to deal with pyproject.toml.
    • Even so pip has since moved to tomli.
  • pytoml
    • Abandoned by the creator for eminently sensible reasons (interesting read too... https://github.com/avakar/pytoml/issues/15) But let's not get into an argument over whether a shiny new fashion in config formats is in fact doing anything better than the previous fashions in config management...
  • qtoml
    • Still on TOML v0.5.0.

Note that these are the pure Python parsers - there are also others that are basically interop wrappers for fast C++ or Rust libraries.

  • pytomlpp
    • Python wrapper for the toml++ C++ library.
  • rtoml
    • Python wrapper for fast parsing written in Rust.

If performance is your main concern, then the C++/Rust implementations might serve your needs, assuming you're fine with these not being pure Python packages.

Let's investigate the pure Python libraries in greater depth:

tomlkit

Of the available options, only tomlkit supports style-preserving roundtrip parsing. Furthermore, tomlkit was created for the express purpose of handling TOML parsing for the poetry tool. As this is one of the 2 most popular new PEP517 & PEP518 Python build systems there is some comfort to be had in the wide adoption of a very actively used tool that means a greater likelihood of continued maintenance & support, and specifically that pyproject.toml files should parse without surprises.

TOMLKit only lists itself as 1.0.0rc1 compliant. Looking at the TOML spec release history delta of rc1 vs v1, it only looks like clarifications & administrative/documentation updates -there doesn't seem to be anything notable missing or functionally different in rc1 as opposed to v1. It's not impossible that I missed something, though - but given TOMLKit's wide usage via poetry, I would expect obvious out-of-date spec handling to have been noticed by someone somewhere, and I see none such in the issues list.

There is a but... TOMLKit outputs custom types rather than just the standard Python built-ins like dict. Specifically it represents tables with classes like class Table(Item,MutableMapping, dict) or class InlineTable(Item, MutableMapping, dict).

(See here for TOMLkit API types.)

The constructors for these do NOT allow any of these to instantiate like a standard Mapping type does - which may or may not fit your needs, depending on what exactly you're doing. It probably doesn't really matter for most purposes.

toml

The toml library is a largely historical artifact at this point. Not only is it well behind on implementing TOML v1, but also because of a lack of maintenance on extant functionality.

Even pip itself has moved from vendoring toml to tomli. This exodus from toml to tomli includes the very prominent:

(The links are to the issues/PRs discussing the reasons why...)

tomli

tomli, then, seems to be where the Python community in general is coalescing for a "standard" TOML parser. tomli is read-only. For write functionality there is the companion library tomli-w.

tomli is explicitly TOML v1.0 compliant.

tomli is significantly faster than TOMLKit. It does not, however, preserve style/whitespace like tomlkit does. For most use-cases, arguably TOML reading is the important part...

Summary

Use tomlkit if you need to round-trip & preserve style + comments.

Use tomli if you're just after reading a config file or write some output without caring about the formatting too much.

Use the Rust/C++ interop libraries if performance is your main concern and you do not have limitations on using packages that aren't pure Python.

In the case of pypyr, tomli matched the requirements with the least trade-offs. And as a mini-review, it was a joy to use 😄.

If you're interested in seeing a real-world usage example for tomli & tomli-w, you can check it out in action here: https://github.com/pypyr/pypyr/blob/main/pypyr/toml.py

Discussion (0)