Abstraction, Composition as Software Development Fundamentals: Reading List 1-3/2022

#computerscience #software #projectmanagement #categorytheory

I have decided to organise my studies of computer science based on trimestral reading lists. For the first trimester, I am taking the advice of Joe Armstrong, from his talk "Computer Science: A Guide for the Perplexed". Between January and March, I set myself to reading:

Algorithms + Data Structures = Programs, Niklaus Wirth;
The Mythical Man-Month, Frederick Brooks;
Category Theory for Programmers, Bartosz Milewski.

The three texts discuss computer science on a fundamental level, but from three different aspects: the elements and operations of computer programs, the human process of writing software systems and products, and the algebra of high-level abstractions that on a lower level manifest themselves as data and algorithms.

Abstraction and Composition

Despite these differences in perspective, there are points of convergence between the three texts. The first is the role of abstraction as a way of conceiving programs, making the understanding easier and the architecture leaner. The second is the idea that composition is the fundamental operation in the architecture and building of programs and software systems.

Abstraction in computer science can be described as the action of zooming out, blurring details and providing a wide view. By abstracting, we hide the particular qualities of something and treat it simply as a named object; we hide the individual operations in a program and treat it as a named function.

{red, wheels: 4, fuel: .8, ignition: off} → car
{1 * 2 * 3 * 4 * ... n} → factorial(n)

Composition, on the other hand, is the action of joining parts of a program to solve a complex problem.

Both operations, composition and abstraction, can be applied iteratively. Consider the biosphere: it is a humongous quantity of atoms, related to each other or not based on the laws of physics, forming molecules. That is a ridiculously complex system to understand in terms of their interactions as living beings, so we abstract the complex structures of molecules into objects, and call these cells, which interact with each other by forming tissues and organs. But these systems are still far too complex; therefore, we iterate the same process, abstracting organisms into living beings, living beings into species, species into classes, kingdoms, and so on.

The same can be applied to computer systems, to great effect. We join elements to each other to create data structures, and can compose data structures themselves. We join functions acting on data, and create programs. We join programs to create systems.

Algorithms + Data Structures = Programs

"A systematic and scientific approach to program construction is bound to include all aspects of data structuring. Programs, after all, are concrete formulations of abstract algorithms based on particular representations and structures of data."

The progress of almost all sciences can be traced back to a few fundamental principles from which they develop. In the study of Metaphysics, for example, we are required to understand the concepts of substance and attributes, the intellect and the extension, the entities and the phenomena. All further development and the branching of newer disciplines depends on these principles. The same goes for the science of computation by machines.

This book, in the spirit of modern philosophers, concerns itself with the first principles of computer science: primitive data, basic operators, composition of data (fundamental and recursive data structures), composition of operations (procedures and functions), the building blocks of computer programs, from which we can build all sorts of complex systems.

The book is divided into 5 subjects. Each step is necessary for the next; just like we design programs by composition of functions and data types, we understand the science itself by composing ideas. They are:

Basic data structures;
Algorithms;
Recursion;
Advances data types;
Compilers.

Concerning data, the main problem is how we represent the abstract objects, structures and operations we have conceptually designed (i.e. the product), in a form that fits and executes in the physical constraints of the machine.
For this, we require a readable abstraction for the binary operations done by the machine, allowing us to represent the data structures and operations that compose the system in a human language. This is the finality of the programming language. The author explains:

"The essence of the use of abstractions in programming is that a program may be conceived, understood, and verified on the basis of laws governing the abstractions, and that it is not necessary to have further insight about how the abstractions are implemented and represented in a physical computer."

The programming language allows us not only to represent the simple elements, but also to abstract functions and procedures as named objects, therefore allowing us to reference them, and implement repetitive operations through recursion.

fact 0 = 1
fact n = n * fact(n - 1)

By applying the idea of recursive referencing to data, we can also create repetitive, complex, and potentially infinite data structures, such as lists and n-ary trees. In a lower level, these abstract structures are implemented as pairs or tuples of elements, containing the element itself and references (pointers) to the next elements in the structure. But in a high level abstraction, these can be defined very simply in programming language notation.

list(a) = [] | [a | list(a)]

These are called dynamic structures, due to the fact that their structure and size change during the execution of the program. We can see that there are similarities between the definitions of algorithms and data.

The Mythical Man-Month

“The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures.”

Recognising that the human aspect -- communication, leadership and subordination, decision making -- is a major part of software development is a hard pill to swallow. What we want is to explore new technologies, to write algorithms that produce expected results, and to create programs that solve real problems. But the reality is that problems are rarely understood, and conceptual schematics are rarely precise, leading to great difficulties in the ideally simple task of writing programs -- after all, how do we write stuff when we don't know what it is we have to write?

Joe Armstrong describes software development, in physics' terms, as an entropy-increasing process. It is an essential and inescapable problem that, as a project grows, so does the complexity of functions, procedures, and data structures: more attributes, more states to keep track, more secondary effects, more dependencies, more errors to handle.

The main virtue of software systems and products is described by Brooks as conceptual integrity: removing unnecessary components, therefore reducing potential bugs and making the system more easily understood. The task of the product developer is therefore to make the product grow, while containing the spread of complexity.

The project, as described by Brooks, has two requirements:

conceptual design, the logical abstract structure of how the parts of the system interact implementation;
representation (implementation) of the abstract structures as programming language code and compiled machine language code.

Both these aspects force us to face the problem of growing complexity. The main takeaway is that software development is an art that allows us to create great things, but it comes with a challenge for which there is no magical solution. We cannot wait for a wonderful programming language, machine or project management framework to make complexity disappear or communication to flow flawlessly. All we can do is, through diligence, apply proven experiences and try new things.

From the first programs in machine code to our current state of the art, several techniques have been developed and proven to work, both in terms of architecture and implementation:

centralised architecture and decision-making, which improves conceptual integrity;
rapid prototyping and incremental growth, which allows feedback iteration;
high-level languages, which remove accidental flaws;
improved tooling, which speeds up development and tests;
object-oriented programming, with abstract data types and inheritance
interactive programming;

We could add to the list the more recent trend of functional programming, which reduces bugs by eliminating side effects, and providing the ideal conditions for concurrent programming.

Category Theory for Programmers

"I hope future generations will be as admiring of the programming skills we've been displaying in building complex operating systems, web servers, and the internet infrastructure [as we admire the builders of old gothic cathedrals]. And they should be, because we've done all this based on flimsy theoretical foundations. We have to fix those foundations if we want to move forward."

There is a standard method of solving problems which seems to be common in a wide range of intellectual activities: it is the method of dividing a complex problem into smaller chunks, solving them separately, and composing the solutions. For example: when we try to understand a complex historical event, such as a revolution or a war, we cannot fully grasp the entire situation all at once; rather, we divide the narrative by different aspects: the economy, the political relationships of leaders, the cultural, philosophical and artistic spirit of the time. These aspects interact with one another, producing the results we call events.

Similarly, we use decomposition and composition in all sciences: mathematics, physics, music, computer science. The author of the book poses the question of whether this method is the way our species evolved to think.

Category theory is the science of abstraction par excellence. It allows us to understand the structure of complex systems, and how the elements interact with each other, without making reference to their qualities. It takes information and stripes down to the bare essentials, until all we have left are objects and arrows between them.

Categories are described in terms of objects and morphisms, where morphisms are arrows that go from one object to the next, and compose with one another. In the realm of programming, for example, we are mostly working with the category Type, where objects are types and morphisms are functions.

f :: a → b
g :: b → c
g . f :: a → b → c

But abstractions, as we described earlier, can be employed iteratively. Just as we have described functions as morphisms between objects in the category Type, we can wrap this structure in an object, and throw it as an object in a new category. This is the category Cat, where objects are small categories and morphisms are functors.

A functor is a mapping of objects and morphisms between two categories. Given a category C, a functor F maps its objects and morphisms to a category D, preserving identity and composition.

For C (a, b), where f :: a → b
Functor F gives D (Fa, Fb), where Ff :: Fa → Fb
For h = g . f in C
Fh = Fg . Fh in D

The mapping of morphisms is implemented as a higher order function fmap:

(a → b) → (Fa → Fb)

In programming, functors can be understood as parametric type constructors: equations that, given a type, produce a new type. We can, for example, take a type int and map it to types such as maybe_int or list_of_ints. Or, we can take the types int and boolean, and construct a list_of_ints_and_bools applying a bifunctor.

We can continue abstracting these structures to higher degrees. Just as we wrapped the objects and morphisms of small categories, forming Cat, we can wrap its objects and morphisms (functors) and throw these in a new category. This is the category Fun, where objects are functors and morphisms are natural transformations (mappings of functors).

As we can see, category theory is a powerful exercise in abstraction and composition, which, as was mentioned before, are often described as the essentials of programming. This ultra abstract science reveals to us a top-down approach to thinking. It consists of the declaration of general, abstract principles, that are then developed into material entities, just like the platonic idea of a chair is materialised into wooden or metal chairs. Brooks calls this the top-down approach for software development, where the work is divided between architecture (or design) and representation (or implementation).