Anatoly Scherbakov

Posted on Jul 31, 2020

Reflections on Literate Programming

#literateprogramming #documentation #knuth

Introduction

Documenting and commenting your program is an activity many programmers despise. Documentation either is never written or gets abandoned and rots. Documentation becomes irrelevant and depressing.

However, a program which is totally undocumented is hard to understand and maintain, up to the point of inability to support and use it. Which is a risk.

Literate Programming

Literate Programming (LP) as a term was coined by Donald E. Knuth in his article from 1984. Quote from https://literateprogramming.com goes:

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

Rather than embedding documentation comments into code, a literate programmer embeds pieces of code into the essay (article, book) which tells the reader about what these pieces of code are doing, and why.

Example

Knuth has been using literate programming to write TeX and many other programs over the years. Here is one of them -- a port of an old Adventure game to Knuth's literate programming tool called CWEB. Check out this PDF.

This document is one of the two outputs of the CWEB system -- this is a reader friendly, readable book (107 pages long) which fully describes the game, including its annotated source code.

The other output would be the code itself in C language, ready for compilation.

When writing the literate source, you can interleave and link the blocks of code and the blocks of text. Thus, the document can be written in a way that's the easiest for a human to read, but the code will be rearranged in the specific order which the compiler prefers.

Reasoning

In a talk named Literate Programming in the Large Timothy Daly, long term developer of Axiom computer algebra system, urges us - fellow developers - to change the situation, suggesting that LP is the only (and best) way to build maintainable software systems. Why so?

Change of perspective. From LP point of view, the essence of a computer program should be communicating its purpose and logic to a human reader. Only incidentally it so happens that this program also can be executed by a computer.

The Why. The human reader wants to understand why the program is doing what it does. This ensures that the reader will be able to change the program with full understanding of the implications.

This means the reader will be able to maintain the program even if the original author is no longer available.

Adoption

Literate Programming - Tools page provides a few of the available tools which support the Literate Programming technique. But the overall adoption in the industry seems, at least to me, rather limited.

Knuth himself states that, since the evolvement of the LP technique, he found himself more productive and the quality of his programs improved. Indeed, TeX is probably one of the best-quality pieces of software of all time. The question is though: can the positive impact of LP that Knuth observed apply to majority of programs and teams?

We don't have statistical evidence to support or deny that; thus, I am going to list a few of very opinionated and subjective points on the matter.

Expressivity of programming language

Daly is providing an example from the history of Axiom. He was once looking at a piece of its code and he did not understand why that piece was there.

The code did obvious things with some bytes in memory, and Daly knew exactly what the code was doing (he was likely the author) - but he couldn't ascertain why it was doing that. That's why an essay-like description would have helped.

But, I would say the code that controls logic of the application should not manipulate any bytes in memory directly; it should be written in a much more abstract way. Like this, for example:

Warehouse().find_package(package_id).location

TeX was written originally in Pascal and then rewritten in C. Both are quite ascetic by modern standards - in terms of expressivity, available data structures, type systems, libraries, and possible syntax extensions.

lambda anonymous functions make the code more concise where previously you would have had to create a separate named function;
map filter reduce and friends alleviate cycles;
Garbage collector or boundary checker permits to skip all the details of memory management;
Type system helps to express concepts, not byte sequences, in code.

All of this reduces the need for a lot of extra explanation. In the example of Adventure game above, there would be no need to say

#include <stdio.h> /∗ basic input/output routines: fgets , printf ∗/

instead (this is pseudocode, I have no idea whether Rust has an stdio crate):

use stdio::{fgets, printf};

Need for explanations alleviated. Code speaks for itself.

Power of Notation

In the Middle Ages, European mathematicians had to communicate their ideas in natural language. Proofs were incredibly hard to understand and reason about when written in lengthy prose. Or, sometimes, in rhymes:

When the cube and its things near
Add to a new number, discrete
Determine two new numbers different
By that one; this feat
Will be kept as a rule
Their product always equal, the same
To the cube of a third
Of the number of things named.
Then, generally speaking
The remaining amount
Of the cube roots subtracted
Will be our desired count.

(Source: Tartaglia's Poem - ProofWiki)

This is the description of the formula for roots of cubic equation by Niccolo Tartaglia.

Invention of algrebraic notation was a great step forward which enabled us to reason about much more complex concepts - and still not blow our heads.

Today, mathematics is using a lot of special symbolic languages to manage extremely complex and abstract concepts. They are expected to be concise, strict, unambiguous, and versatile.

Computer programming language is actually even stricter than the most abstract mathematical paper. Vladimir Arnold, a renowned mathematician, says in one of his lectures that Nikola Burbaki in their series of books on mathematics foundations aspire to be as strict as possible in the first books, but their grip kind of losens from one book to the next. That will not work with computers, though.

Alternatives or complements to LP

Being on the side of Literate Programming, it looks like that we have no hope for the programming language to convey the concepts of the domain. Its expressiveness is poor, and we resort to the natural language with all of its limits - ambiguousness, verbosity, uncertain and vague nature - to do the job.

Maybe we should do something different?

Yes, I think we should.

Take inspiration from mathematics

Programming is, essentially, a specialized branch of mathematics (at least that's what Alexander Stepanov, author of C++ STL, says in his talk at Yandex). Mathematics is the art of abstraction in its purest form. We struggle with abstraction in programming. That's why I almost irrationally believe that mathematics should have methods and ways to help us make our programs better.

Anyway, if you are familiar with

mathematical logic,
graph theory,
finite automata theory,
category theory,
type theory

...and/or any special areas which may be relevant to your sphere of interests - this won't hurt you.

Functional programming ideas which we've briefly touched above are now practically mainstream. They were completely derived from mathematics and only in time found their way to the minds of the majority of software developers.

Domain Specific Languages

DSL = Domain Specific Language is a formal language designed specifically for certain domain: computer graphics, warehouse management, flight path computation. SQL is one good example of a DSL. DSLs can be internal or external.

JetBrains coined the term Language Oriented Programming to describe a paradigm where you build a special language for every special job.

Alan Kay was working for a while on STEPS operating system which was built upon dozens of languages, each designed for its particular purpose.

IDEs

Among the things Daly mentions as benefits of Literate Programming are:

Indexing and reverse links from a piece of code to other pieces which use it
Tables of contents
References

However, these things are not surprising for anyone who has been using any modern IDE. To the contrast: using a literate programming technique will likely make it impossible for us to use a normal IDE. We're very much accustomed to refactorings, search in code, snippets, integration with linters etc, - and that will more likely obscure our development experience rather than improve it.

Linting

With linters and static code analysis, we can find particular errors in our code which are easy to recognize. Daly provides an example. Someone introduces a number in code which is 153. What does that number mean? Why 153 and not 654?

Daly continues that, because it happened in a literate program, the author was able to write something like:

I chose this number for no particular reason. Choice of the number does not change anything.

If a linter would find such a number it would say this is a magic constant which should be given an explanatory name.

But the linter can't do anything if you forget to describe something in the literate source: this is natural language, no hope.

Notebooks

Well, when I was saying that LP did not get any traction above I was not entirely honest. One particular form of Literate Programming is quite widespread. It is called Notebooks.

Under a notebook, we mean a document which consists of a series of blocks; every block can be either code or free-form text; and people use these to perform data analysis, conduct experiments, write one-off scripts, or do some Reproducible Research. Examples:

Conclusion

Should you use Literate Programming in your project or not? It depends.

After watching Daly's emotional talk, I was for a while fascinated by the idea of converting my ysv project into a literate program. However, I decided not to.

Rust is a very expressive language. If I can master it well enough, I can introduce powerful abstractions which will permit me to reduce the syntactic clutter, and to make the code as transparent and concise as humanly possible.
I will however use rustdoc as the recommended way of documenting your source code. I will comment difficult parts of the code to describe what they are doing, and I will use standard Rust tools to generate documentation from this source.
And, if I want to invite other people and communicate with them, I'd better use the language many people speak instead of creating my own obscure toolchain to support my overly literate project.
And finally, I was thinking lately about the meaning of focus in life. Adding literate programming as another dimension of ysv project would mean I am continuing to scatter my attention. That will not help me to succeed and get this project done.

(Cover source.)

DEV Community