Recently I read few commentaries about comments in code, the most recent one just here on dev.to, so I guessed that I had to write mine too 😜
A bit of background about me: my main programming language is Ada, therefore I will refer to the Ada program structure and my examples will be in Ada too, but it should not be too difficult to adapt what I write to a different context.
Comments split the developer community in two: those who believe that comments are evil and that the code should be self-documenting and those who agree that self-documenting code is nice, but you need comments nevertheless.
The no-comment side usually says that comments can be misleading if they are not correct or out-of-date, to that the pro-comment side replies that if you are a professional, you update your comments and to that the no-commenter replies that "update your comments" is a good fellow policy that is, a policy that works only if everyone is a good fellow.
Personally, I am more on the no-comment side, but keep reading because the issue is nuanced and even if you are a pro-commenter, you'll find that maybe our positions are not so far apart.
My position is similar to the one taken by the Ada Style Guide (yes! There is such a thing...). Let me quote an excerpt
Comments in source text are a controversial issue. There are arguments both for and against the view that comments enhance readability. In practice, the biggest problem with comments is that people often fail to update them when the associated source text is changed, thereby making the commentary misleading.
...
If possible, source text should use self-explanatory names for objects and program units, and it should use simple, understandable program structures so that little additional commentary is needed.
Definitively a no-comment position, although not an extreme one (If possible...). Few lines below there is a sentence that summarize my position
Use comments to state the intent of the code. Comments that provide an overview of the code help the maintenance programmer see the forest for the trees. (emphasis mine)
Four tiers of comments
With time I developed this idea where I think software documentation as organized in four tiers
- Tier 0 Overall description of the program, specs, architecture. This tier is more about documentation than comments since I expect this kind of information to be in some PDF files rather than in the source code.
-
Tier 1 Package level documentation. This information is often found at the beginning of the package interface specs (
.ads
for Ada,.h
or.H
for C and C++, and so on), maybe after a license blurb. Here I describe the goal of the package, why the package is necessary and I give an abstract description of the resources defined there. To understand what I mean with abstract description, consider the following example of a package that defines a symbol table for some kind of interpreter
-- This package defines a *symbol table* type used
-- by the interpreter to store information about the symbols
-- defined by the programmer.
--
-- A symbol table maps a *symbol name* to a *description* and
-- it must be able to
-- * Query the table for a symbol
-- * Define a new symbol
-- * Update the information associated to a symbol
-- * Push a new symbol table (when a new block is open)
-- * Pop the top symbol table
--
package Symbol_Tables is
-- Add stuff here
end Symbol_Tables;
Note how the details of the API are not described, this is a description that applies more or less to anything you could want to call symbol table. Because of this, you can expect this description to be fairly "stable," that is, it does not change frequently.
- Tier 2 API. At this level we describe the intent of each resource (types/functions/procedures) defined in the package.
With a good Tier 1 documentation and descriptive names for the resources, the intent of most functions/procedures should be fairly clear. Nevertheless, a paragraph in human language does not hurt, especially to clarify details of the behavior that may be not evident from the interface (e.g., what happens if I query the symbol table with a symbol that is not defined).
Contracts (that is, pre- and post-conditions, in those languages that allow them) are a great way to write Tier 2 documentation since they are formal (they do not have the ambiguity of natural language) and cannot be out-of-sync with the code since otherwise an exception would is raised. Integrate the contract formal description with a sentence in English and you have the perfect mix for Tier 2 documentation.
For example, still with the case of the symbol table one could have
-- Return True if Table contains a symbol with name Name
function Contains (Table : Symbol_Table;
Name : String)
return Boolean;
-- Get the description of the given symbol. Raises Symbol_Not_Found
-- if Name is not in Table
function Description_Of (Table : Symbol_Table;
Name : String)
return Symbol_Info
with
Pre => Contains(Table, Name);
-- Define a new symbol. Raise Duplicate_Name if Name is
-- already in Table
procedure Insert(Table : in out Symbol_Table;
Name : String;
Info : Symbol_Info)
with
Pre => not Contains(Table, Name),
Post => Contains(Table, Name)
and then Info = Description_Of(Table, Name);
Note how the pre- and post-condition already describe to a fair detail the behavior of the functions. For example, from the pre-condition of Insert
I can see that I cannot call it if Name
is already in Table
; also, I can see that when Insert
returns Name
will be in Table
and its associated description will be Info
. This is no surprise, I agree, but it is nice to have it formally stated in a contract. Also, if Insert
has a bug and does not insert properly Name
and Info
, at the exit an exception will be raised.
-
Tier 3 Implementation comments. This is the tier where the self-documenting code should replace the comments. At this level you describe the implementation details (algorithm used, data structures, ...) and because of this, comments at this level can be quite volatile, therefore increasing the probability of having a misleading comment. At this level I prefer to follow the self-documenting approach, adding comments only in those cases where the algorithm is fairly involved (e.g., the Euclidean algorithm for GCD).
Often I prefer to use assertion like
pragma Assert
,pragma Loop_Invariant
orpragma Loop_Variant
to have the equivalent of a contract, but for Tier 3. For example, in a loop for a binary search one could want to write a comment like
-- Here Data(Top) >= X and X > Data(Bottom)
but I prefer to replace that with
pragma Assert(Data(Top) >= X and X > Data(Bottom));
As another example, in the loop of the Euclidean algorithm for the computation of GCD, one could want to write
-- The value of Smaller decrease at each iteration and since
-- Smaller is non-negative the loop will end sooner or later
to explain why the loop will end; however,
pragma Loop_Variant(Decreases => Smaller);
describes the same fact, but in a more formal way that can be checked at run-time and used, eventually, to formally prove the correctness of the code.
Summarizing...
At this point my position should be clear: Tiers 0, 1, and 2 are the maps (at increasing level of detail) that help us to navigate the forest. The information gets more "stable" as the tier level decreases, reducing the risk of having out-of-date comments. Here comments are useful and definitively cannot be replaced by self-documenting code.
Comments at tiers 3 are more tightly connected with the actual implementation and the risk of volatility is greater. In this case I prefer to follow the self-documenting approach, using them sparingly and only in those cases where I think that a clarification is in order.
It should also be clear now why my position is not so far away from a typical pro-commenter: the claims used by pro-commenters are fairly often related to lower tier information (e.g., *"Without comments you must read all the code to understand what a function does," this is a Tier 2 issue), while the risk of out-of-date comments, the position of no-commenters, is more related to Tier 3.
Top comments (1)
I think you have a great point here: the idea of thinking about tiers is a very interesting model for when to comment or not. ]
Some time ago I tried to reflect on how not to comment (instead of why not). I got this feeling that the problem, of course, is not with the concept of comments, but with the somehow unstructured use of them. I didn't really thought much about how to use them in a structured way, and your post offers a great example of that.