Chirag Jain

Posted on Mar 24, 2019

On Abstraction – Zach Tellman - ClojuTre 2017

#abstraction #softwareengineering

Zach Tellman is the author of the book "Elements of Clojure". In this talk titled "On Abstraction" he tries to define what is an abstraction and how do we can build better systems with a better understanding of what goes into making an abstraction.

The goal of the book was the best second book you read about Clojure, when you know what you can do with the language but you don't know what parts of the language to use to solve a particular problem.

The first chapter was about names: "Naming and Necessity". Even though it is well known that naming is a hard problem, CS doesn't have enough literature tackling the problem. "The analytics school of psychology" has explored the topic in excruciating detail.

The second chapter was about abstraction but there wasn't any literature directly targeting it and no other paradigm deals with this topic in depth.

Abstraction

The word abstraction is used to refer to 2 very distinct concepts, demonstrated by the following 2 ideas:

Church Numerals

Alonzo Church's lambda calculus introduced these. The numbers are represented as functions, the number 3 is a function which takes a function and applies it 3 times to a value.
Cons Cells

Typically used to create linked lists, with the base case of Nil representing an empty collection.

The Church Numeral weren't used in Computer Science because representing numbers in computationally intensive, they have been widely used in mathematics (which is timeless). Cons Cells are widely used in CS, because following a link is fast, but the landscape has changed rapidly the computer processing has increased more rapidly than the decrease in memory latency.

The most common formal definition of abstraction in CS literature comes from the paper: "Proof of Correctness of Data Representation" by Tony Hoare, who also brought us "Communicating Sequential Processes".

The paper introduces 2 major concepts:

Abstractions model the internal model onto the external semantics.

Invariant constrain the internal model.

Abstraction is the mapping of the internal implementation to the external interface. The invariant allows different implementations without being correct with respect to each other, they only have to satisfy the invariant. It however doesn't talk about the environment the object operates in.

To define an abstraction we need 3 parts:

Model

The implementation
Interface

The means to interact with the Model
Environment

Everything else

Model

The model is initially empty, all the data comes from the environment hence in a way the model reflects the environment. This is close to what Physics does to create models for real world concepts. We can't do this in Software as Physics uses deductive reasoning to create a model based on observation and keep tweaking it until the predictions are correct, to model this in software would require a very rick model.

50-60 years before physicists were actively involved in computer programming. In 1959 the first attempt at AI was made in a project called "General Problem Solver (GPS)" using means-end analysis.

Most of software instead relies on inductive reasoning which is based on analogy, this allows us to skip a lot of details making the models a lot simpler. Example: A tick — It latches onto something generating heat or secreting butyric acid above a certain threshold. We cannot create a perfect model for this but we don't need to. The model just needs to be accurate enough.

The model can only satisfice.

The book "The science of the artificial" by Herbert Simon (principal investigator for the GPS) coined the term satisfice for "pragmatically crappy solution".

It's easy to make deductive model, but hard to make a useful one. We can reduce something in terms of arithmetic for a deductive model but it it doesn't mean that the predictions would be right.

Models assume everything that they omit is either invariant or irrelevant. When assumptions leak out we have to use conventions. This doesn't make the assumption always valid, it makes it less likely to be invalid.

The interface

Interfaces represent the intersection of many models or one model over time.

Consequences of our Model

To abstract is to ignore.

To think is to forget a difference, to generalise, to abstract. In the overly refute world of Funes there were nothing but details, almost contiguous details.

We can't take into account all the details of the world in our programs, we have to make assumptions to reason about things efficiently.

An abstraction is useful only if the assumptions are sound given the context it's being used in.

That means the usefulness of the software is a function of the context it's being used in.

To know an abstraction's assumptions, we must know its model. Possession does not imply understanding.

If the model assumes too much, we can:

Make the model larger
Replace our model
Narrow our intended usecase

The following generations, who were not so fond of the study of cartography as their forebears had been, saw that the vast map was useless.

If an abstraction doesn't solve the user's problem, they can:

Discard the abstraction
Wrap the abstraction
Create Conventions

If an abstraction can't be discarded it becomes coercive. What it doesn't see might disappear.

Software would be easy, it it weren't for the changing environments.

The environment consists of 3 things:

The entire world
Users
Other software components

Systems of abstractions

There are 2 approaches:

Principled

Make everything predictably structured, so that the side-effects of changes can be predicted accurately.
Adaptable

Make the parts sparsely connected such that when we make a change we only have to reason locally about the system.

There are 2 kinds of cultures:

Self-conscious

There's an person called the architect, who is an expert in design and construction of buildings.
Unself-conscious

Everyone builds their own home and there not many other structures that they create.

Principled systems have hierarchies. There's a place for everything and everything is in it's place.

Adaptable systems are graph-like, they don't have a central organizing principle, we are able to make local changes to them.

If abstraction is an island our code becomes Galapagos. Variation is only useful when it mirrors the problem being solved.

Principled code is brittle and predictable. Adaptive code is flexible and unpredictable. There's no in between. What we can do is layer them.

Adaptable components in a principled framework. The degrees of freedom become vestigial and disappear. There's nothing to adapt to if the environment doesn't change.

Principled components in an adaptive framework are called complex adaptive systems. If the principled system is too small then we have to write a lot of glue code. If it's too large then the replacement cost becomes extremely high.

Assumptions that fail together, belong together.

Principled components are smaller faster and can be understood incrementally.

We tend to put our bias towards principled component too often due to our optimistic assumption about the environment due to lack of complete understanding.

Balance is the key here things like logging libraries should be principle components, everyone doesn't need to roll their own. While custom requirements should form the adaptive surrounding.

Original link to the talk

This is my 4th post in this series do check out the previous ones :)