DEV Community

Juan Luis Cano Rodríguez
Juan Luis Cano Rodríguez

Posted on

Futuristic documentation systems in Python, part 1: aiming for more


I love Python, and I love writing documentation.

In fact, I was so lucky that at some point I landed a job as Developer Advocate for Read the Docs, so for most of 2021 I was able to deeply engage with the Sphinx community, contribute the first actual Sphinx tutorial, push for wider adoption of MyST as the way to go for Python documentation, and much more.

However, while I love how powerful Sphinx is and how the project is still going 15 years after its creation, let's just say that Sphinx is a little bit tricky at times.

"Sphinx is NP-Hard"

Recently I started a position as Developer Advocate for Kedro, an opinionated data science framework, and one of the things we're doing is exploring what are the best open source tools we can use to create our documentation.

It turns out that this sent me into interesting rabbit holes on the state of the Sphinx alternatives, and I started to connect some dots and imagining how a futuristic toolchain for documenting Python projects could look like:

Screenshot of my post on Mastodon "Is there something like OpenAPI (hence an open specification) but instead of targeting HTTP APIs, focusing on software libraries in general?"

(link to original post on Mastodon)

In the end, I have managed to discuss with many smart folks, collect a fair amount of links and form a decent body of thought that I wanted to share with the broader community. And also, make the perfect excuse for me to start blogging again!

In the first part of this post I will describe the broader context of documentation systems, with special attention to Python compared to other ecosystems, and justify why I want more powerful systems. In the second part, I will cover some of the specific problems I have with the current documentation systems that exist in Python, with a focus on Sphinx and MkDocs. And in the third part, I will take a futuristic view of how we could double down on MyST and try to expand its popularity, as well as leverage standalone docstring parsing that is not tied to a particular documentation system, to create better toolchains.

Let's go!

Part 1: Aiming for more

First of all, I think it's essential to reflect a bit on what a documentation system is or does. In my view, tools like Sphinx or MkDocs basically boil down to these two things:

  1. a docstring parsing mechanism, and
  2. a static site generator.

The former enables generating API references, and the latter enables narrative documentation in addition to assembling everything together in a coherent way.

(There's of course a separate question of what makes a good SSG for technical documentation - extensibility, powerful cross-referencing features, good syntax for code blocks, and we could go on - but let's dump all that into static site generators for now.)

The reason why documentation systems for Python are so limited is not for the lack of static site generators (after all, it's not that important in which language an SSG is written as long as it renders the markup) but because of two main reasons:

  1. Historically, the docstring parsing systems available were coupled with their companion SSGs. In other words: one cannot simply use sphinx-autodoc outside of Sphinx, or mkdocstrings outside MkDocs.
  2. Python choice of reStructuredText (which made absolute sense back then because Markdown didn't even exist) and the late development of MyST (the first really extensible and CommonMark-compliant dialect of Markdown that brings all the powerful features from reST) mean that most of the SSGs outside Sphinx have evolved their own Markdown dialects that are incompatible with the syntax choices the MyST people made. For example, Hugo uses shortcodes (effectively a templating language) inside Markdown for cross references, and MkDocs uses Python-Markdown, which is not even a CommonMark implementation and as a result has evolved a bespoke extension mechanism.

The result of this historical evolution is that the world of static site generators has made amazing progress over the past few years and the Python ecosystem hasn't caught up with it. Nowadays there's a rich ecosystem of different options for static site generation, some of which lean more towards server side generation and simplicity (like Hugo) while others leverage Single Page Application frameworks like React.js or similar (like Docusaurus). On top of that, there are "headless CMSs" that can use some of these SSGs as a backend, offering a more sophisticated authoring experience while offloading the HTML output generation to a different component (like Decap CMS, Ghost, or Forestry).

What I'm trying to figure out is: as a team developing a Python library, what if we could use any SSG of our liking for the creation of our documentation, while keeping the underrated features of MyST and the ability of generating API references? In other words, would it be possible to create documentation sites for Python projects using Hugo, or Docusaurus as backends?

Wait for the second part of this post!

Top comments (1)

coderatul profile image

did you find someone who is interested in thin project coz I am