Eric Normand

Posted on Jul 4, 2017 • Originally published at lispcast.com

What is an abstraction?

#abstraction

For a term we use so much in our field, there are very few definitions of abstraction. And when I gave my talk called Building Composable Abstractions, that was a persistent question: what did I mean by abstraction? I'm going to be talking about abstractions a lot, both on this site, and in verbal discussions. I'd like to know what I'm talking about. Instead of searching for a precise definition, I'd like to expand on the background ideas that make it so difficult to define.

There are two uses of abstraction that I think will serve well to begin this discussion. The first is from one of my favorite programming books, Structure and Interpretation of Computer Programs by Abelson and Sussman (Section 1.1).

means of abstraction, by which compound elements can be named and manipulated as units.

In their "definition", abstraction is about naming. Naming is a funny thing. It's used for identity--as in the name of a person--and also to impart a meaning--as in to name an idea. We have a tendency as humans to come up with new terms. They are new names for perhaps new meanings. It is part of our natural linguistic abilities. When we program, we do this all the time. Whenever we create a variable or name a function, we're inventing a new term and assigning it a meaning.

Its meaning, in our program, is its behavior. What does this thing do? When should I use it? How do I use it? A function calculates a return value based on its arguments. You use the function by calling it or passing it as an argument to something that will call it. In Clojure, you call a function by putting its name in the first position in parens. In Clojure, functions can also have effects.

Programming language theorists call enumerating all of this meaning the semantics of the language. But it's just another word for meaning. Giving a descriptive name to a thing is all about giving it a clear meaning.

How are Clojure functions implemented by the compiler? We mostly don't care. We write functions and call them without much regard for the things they compile to. Insofar as we can ignore those implementation details, we call the abstraction robust. When that implementation becomes important and we can't trust the abstraction to work as it is intended, we call the abstraction leaky. A robust abstraction is able to hide the details from us and give us a new basis on which to base other abstractions.

That brings me to the second "definition", this one by Edsger Dijkstra.

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

Dijkstra's quote goes right to the matter of its purpose, which we have yet to go into. We want to create a new semantic level (meaning again) where we can be precise. My reading of the word precise is that we need to be able to say exactly what we mean and no more. We want our compound elements to be exactly suited to their purpose.

I believe this is the hard part of programming that we always talk about. We are simultaneously inventing a new purpose and the thing which is supposed to be suited to it. Then we have to name it to make that purpose clear. Three things we're doing simultaneously. That's a lot of degrees of freedom that can lead us to an imprecise abstraction.

There is a lot of distrust of abstraction in our industry, and I think rightly so. We have been burned time and again by abstraction for abstraction's sake and abstractions that hide problems. These are abstractions that don't fulfill their purpose and should be distrusted. But we should not distrust all abstractions. Part of our job is to learn what constitutes a good one.

Here's a great example of the distrust of abstractions. Joel Spolsky, internet entrepreneur, coined an aphorism he called "The Law of Leaky Abstractions":

All non-trivial abstractions, to some degree, are leaky.

He gives the example of TCP: the abstraction is "make a TCP connection and send data reliably, despite lost packets and other vagaries of networks". It's a great abstraction except it has a leak: what if you pull the network cable? Nothing will get through! According to the law, the mechanism will always "leak" through eventually. It reminds me that metaphors can only be stretched so far and that there will be some true things we cannot prove.

I think the law has a lot of truth in it, but his examples are a bit of a strawman. Why? Well, sockets handle a lot for you, and they also have well-defined errors. If the connection is severed, your socket will raise an error. So the socket abstraction is not actually hiding that network cable problem from you. On the contrary, it's building it into its fabric. Perhaps the real problem is that errors can so easily be ignored in our programming language. So I like to think about what I call an "Inverse Spolksy's Law":

All too-trivial abstractions, to some degree, are leaky.

The idea is that the abstractions that are leaky are the ones that are not precise in the way Dijkstra specified. They're hiding something they really can't hide. To make it more precise, you need to hoist more into the higher abstraction level than you would ideally want to. And this is the number one sin I find in abstractions: they hide too much.

Abstraction is something we also see in Algebra. We give a value a name, though we don't know what that value might be. We call these variables in Algebra. We can manipulate these names like values and arrive at sensible answers. This shows a very beautiful relationship between mechanical symbolic manipulation, meaning representation, and mechanical calculation. Abstractions have laws in themselves. Hence I can manipulate a program algebraically and talk about its properties, all without running it. The abstraction becomes a thing to talk about.

Something in there points to the roots of intelligence, and many books have been written about this. The reason it is so hard to talk about is that we don't have very good introspection into how we think. We're still understanding it after trying for thousands of years. We're better at doing it than knowing what we're doing.

The thing I love about programming is that we deal so directly in meaning, like an artist or a philosopher does. We know that, in the end, we are building a mechanism to control electron flow in our computers. Yet we work in the world of ideas. My brain wants nothing more than to escape from the mundanity of logic gates and build sculptures of thought.

Conclusions

We deal in abstractions every day as programmers. We're either using them, creating them, or debugging them. Abstractions are a natural extension of our linguistic abilities. They let us name concepts so that we can use them to form bigger ideas. Much of language design and industry programming books are about how exactly to make these abstractions. How do we go about making these things? Are there better and worse ways? These are some of the things we must understand better as software takes over more and more of our lives.

If you're interested in the big ideas in Computer Science and the philosophical questions of programming, you should check out the PurelyFunctional.tv Newsletter. It's a weekly romp through the theory and practice of Clojure and functional programming.

Top comments (7)

Kasey Speakman • Jul 5 '17 • Edited

I think one of the biggest reasons abstractions are leaky is because they are too prescriptive. E.g. Inherit from this class. Ok, which methods do I have to override to achieve what I want? Do I need to call base or not? How does this affect other properties of this class? What if the interface is synchronous, but I need to do some async work and it should not be blocking? So the author of this abstraction either has to make all these choices for me up front (sometimes called "opinionated") or has to provide parent class/method permutations for a lot of cases. Or it uses something like Reflection to adapt to many different cases. Either way exposes me to implementation details (as conventions) and limits me down to whatever fits in that box.

The most useful abstractions are ones that expose the operations you can perform and let you opt into and compose only the ones you need. When the abstraction is at the edges of a program (e.g. web server), it should stay as course-grained as possible. A good example of this is the Suave web server. The simplest request handler takes a raw HttpContext and returns one (optionally and asynchronously). You can handle the request absolutely any way you want or ignore it. But Suave has opt-in helpers for common scenarios like GET to handle only HTTP GET requests to a specific URI or setHeader, etc. These helpers are composable, and you can make your own. This abstraction places few limits on me, and I have the option of not learning/using the built-in helpers if I so desire.

Eric Normand • Jul 6 '17

Hey Kasey,

Yeah, great examples.

We tend to call those things "abstraction". And they are in the Sussman sense since we're naming a compound element. But it's not an abstraction in the Dijkstra sense, because it's not really making it so you can be totally precise. Or at least it's hard to make it precise. You're telling the user of your library that they can subclass this thing but they have to follow all of these poorly-defined rules. To make it less leaky, you should be able to limit the number of rules you have to follow, or at least make them automatically checked.

There are also leaks just from oversights. One of the massive oversights of many languages is allowing a Null Pointer. Now there's this mandatory check after every method returns. Do I actually have the thing the type told me or do I have null? People talk about the cost of Null Pointer exceptions in production. But what about the cost of adding an if statement after every method call? When we design an abstraction, we have to think hard about introducing corner cases and avoid them at all costs.

Rock on!
Eric

Huseyin Ergin • Jul 4 '17

I really like Dijkstra's definition (explanation) of abstraction. An abstraction will hide some of the stuff but not all. Only the ones that is related to the target domain of the abstraction should remain. Other irrelevant ones should be hidden. Preliminary examples are domain specific languages. But they are far from perfect at this point.

Eric Normand • Jul 6 '17

I really like that definition, too.

Follow up question: does your language actually let you do that? I mean, really omit unnecessary details?

Ben Halpern • Jul 4 '17

Great post, as always, Eric.

I have a thought about this line:

There is a lot of distrust of abstraction in our industry, and I think rightly so.

What about the abstractions that are so helpful and natural that we don't even think of them as examples? Like how we might complain that special effects ruin movies, when the only examples we can think of are the ones that did suck, and the seamless effects go completely unnoticed or unrecalled.

Is it possible that the discourse is skewed a bit too much because it's easier to notice the examples that confirm our distrust?

Eric Normand • Jul 6 '17

Hey Ben,

Thanks for the kind words.

Abstraction is supremely important in our work. I think we should distrust it more than we do, though, because 1) we don't understand it well and 2) it's too easy to do.

I think your example of special effects is spot on. In the hands of a master, or really good team, you never even notice the abstractions. Our industry is growing so fast that masters are rare. Since it's easy (and fun) to make abstractions willy-nilly, we tend to get lots of bad ones. If we had some process for making good ones, we could learn it and teach it. But right now, we don't really have one.

Really, I'm kind of ashamed. I've been programming since I was 12. I went to college. I have a Master's in Computer Science. And I've never read or written much about abstraction. I've tried to find stuff written on it, and there isn't much. We're unaware of what we're doing.

Rock on!
Eric