I've previously written about Church encoding in the Joy programming language. The article focused on booleans, logic and predicates, but didn't touch on numbers. Church numerals have been haunting me ever since, and after stumbling on two other concatenative languages, Dawn and Stck, things have finally fallen in place for me (avoiding an obvious pun here) and I've been able to simplify the Church boolean scheme as well. Instead of
updating my previous article, which would be a rewrite anyway, I'm doing a new article on Church encoding in Joy.
This article is technically part of the series Elixir of Life, but is intended to be self-contained.
Feel free to skip this intro to Joy if you've read any of my other Joy articles.
Joy is a functional, stack-based concatenative programming language developed by Manfred von Thun. It has many similarities to lambda calculus, combinatory logic, and Lisp. Like combinatory logic it differs from lambda calculus in that it doesn't feature lambda abstraction, the way in which formal parameters are specified in lambda calculus, and therefore does not have the concept of free variables. Joy differs from combinatory logic in that it is not based on function application, but rather on function composition. All joy functions, often called combinators, take a stack as its sole input and returns an updated stack as result. Joy uses postfix notation, also often called reverse Polish notation.
Joy is a fully-fledged programming language packed with all kinds of convenience functions that facilitate everyday program flow, such as conditional execution, repeated execution, recursion, and IO. It also comes with a rich set of types. But in this article, we begin to show how almost all of that can be thrown away and rebuilt from only a handful of functions and quotations. A quotations is essentially a program literal that takes the form of a list of functions that remain unevaluated unless explicitly executed. The following set of combinators, which can actually be reduced even more, is enough to render Joy Turing-complete:
dup - duplicates the top of the stack pop - removes the top of the stack swap - swaps the two tops elements of the stack i - executes a quotation (removes wrapping 's) unit - quotes the top of the stack (wraps it in 's) cons - prepends the element below the top of the stack to a quotation on top cat - concatenates the two top elements (quotations) of the stack
Here is a simple Joy program:
 [dup] [pop] swap i i, it evaluates to
. Terms are executed from left to right. We first place
[pop] on the stack. Then
[dup] from the stack, switches them around and puts them back on the stack. Next
[dup] which was on top of the stack, and thus causes it to be executed, duplicating
[pop] which was right below it on the stack. Finally, the next
[pop] which then removes its duplicate that was below it on the stack. This leaves only
 on the stack.
Church encoding is a scheme by which any calculable operator and its data (operands) can be expressed purely in terms of functions. It has traditionally been explored mostly in the context of lambda calculus, but works well in concatenative languages, and - I imagine - Lisp, and combinatory logic. In fact, it works in
any Turing-complete language.
In simple terms, you only need a single primitive: the function. All other primitives (numbers, booleans, pairs, lists) can be encoded in terms of functions.
For example, in Joy (remember it uses postfix notation)it is possible to express the predicate
1 + 2 = 3 as follows:
1 2 + 3 =
where each of
= is a composite function that is defined in terms of a handful of primitive functions such as
cons. When executed, it leaves the value
true on the stack (since the sum of 1 and 2 does in fact equal 3), where
true is also a composite function defined in terms of a handful of primitives.
According to our scheme, the definition of
true happens to be:
true == [pop]
But there are numerous different schemes, each with its own pros and
cons (not avoiding an obvious pun here). There is no canonical representation of any literal, and its semantics as an operator (say
+) or operand (say
1) arises from the scheme as a whole and how it maps onto logic and arithmetic.
In this article we employ a scheme in which literals are represented as quotations - function compositions that remain unevaluated until explicitly evaluated.
Booleans can be represented as functions that expect two values on the stack, returning the one argument if true and the other if false. In other words, we can write reduction rules for
false as follows:
A B true i == A
A B false i == B
These reduction rules lead to the following definitions:
A B true i == A A B true i == A B pop A B true i == A B [pop] i true == [pop]
A B false i == B A B false i == B A pop A B false i == A B swap pop A B false i == A B [swap pop] i false == [swap pop]
Don't stress if you don't quite yet follow the derivation process, we'll get back to it in more detail.
Note that we could easily reverse the definitions of
false. As long as we construct the boolean operators correctly, the definitions of
false are largely arbitrary.
While it is convenient to think of
B as arguments of the functions
false, they are in fact two elements on top of a stack that is the only argument of the functions.
true in action:
A B true i A B [pop] i # definition of true A B pop # i unquotes the element on top of the stack A # pop discards the top of the stack
Admittedly, there is nothing "boolean" here yet. The boolean nature of
false will only appear once we introduce some operators. Note the need for
i (apply) after the boolean function. It is required because
false are quotations and therefore "inert" unless applied.
The first operator that we'll consider is
The semantics we expect from
and is as follows:
false false and == false false true and == false true false and == false true true and == true
And we achieve that with the following definition of
and == dup i
It is easy to verify that the above definition works as expected, but it
is not immediately clear how we arrived at the solution. Let's start with
some reduction rules and work our way down to a definition. In case
true, the reduction rule is:
A B and == A
And in case
B is false, the rule is:
A B and == B
In order to derive a definition for
and we want to rewrite the reduction rules in the form:
A B and == A B C
C is the definition of
and. We do this by systematically introducing function compositions to the right hand side that are equivalent to the identity function
id. This is similar to how introducing
+ 0 or
+ x - x to to a mathematical expression has no effect on its value.
We start with the case where
B is true.
A B and == A # if B is true, only A counts == A B pop # since B pop == id == A B [pop] i # i unquotes [pop] == A B true i # definition of true == A B B i # since B is true
And in case
B is false:
A B and == B # if B is false, we can ignore A == B A pop # since A pop is a no-op == A B swap pop # A B swap == B A == A B [swap pop] i # [swap pop] i == swap pop == A B false i # definition of false == A B B i # since B is false
And so we arrive at a common reduction rule that is independent of the value of B:
A B and == A B B i == A B dup i
and == dup i
We expect the following semantics from an
false false or == false false true or == true true false or == true true true or == true
Using the same deductive process as before, we arrive at the following
or == [dup] swap unit cat i swap i == [dup] dip swap i # defining dip == swap unit cat i
At first glance this is quite a bit more involved than the
but we'll get to a more elegant definition below.
not operator should simply negate the boolean that is on top of the
false not == true true not == false
Let us once again work our way down from reduction rules to a definition. If
A is true, then:
A not == false == false true pop # true pop == id == false true [pop] i # quote then unquote == false true true i # definition of true == false true A i # since A is true
And similarly, when
A is false:
A not == true == true false pop == false true swap pop == false true [swap pop] i == false true false i == false true A i # since A is false
Now, regardless of the value of
A not == false true A i A not == A [false true] dip i not == [false true] dip i
not, the definition of
or simplifies to:
or == dup not i
We can also define
nand == and not and
nor == or not. Incidentally,
nor are the only two functionally complete operators, meaning that reduction rules for any other boolean operator can be written purely in terms of either
nor. Let's briefly look at how
nand can be implemented using only
A B nand == A A nor B B nor nor A A nor B B nor nor nor
And, for symmetry, how
nor can be implemented using only
A B nor == A A nand B B nand nand A A nand B B nand nand nand
The reader is invited to check the veracity of these reduction rules and to derive definitions for
nor from them.
So far we have seen how choosing
true == [pop] and
false == [swap pop] has
allowed us to devise definitions for all the usual boolean operators. Our choice, which is not the only possible choice, is based on the idea of selectively discarding functions. I.e., pop the top of the stack, or pop the value below the top of the stack.
It is not surprizing then that we can also define functions that behave like
conditionals or if-then-else blocks. We have essentially already come across a primitive if-then-else block in the definition of
We can take this one step further and define
branch with the following reduction rule:
cond [then] [else] branch == then # if cond is true cond [then] [else] branch == else # if cond is false cond [then] [else] branch == [then] i # if cond is true cond [then] [else] branch == [else] i # if cond is false cond [then] [else] branch == [then] i # if cond is true cond [then] [else] branch == [else] i # if cond is false cond [then] [else] branch == [then] [else] pop i # if cond is true cond [then] [else] branch == [then] [else] swap pop i # if cond is false cond [then] [else] branch == [then] [else] [pop] i i # if cond is true cond [then] [else] branch == [then] [else] [swap pop] i i # if cond is false cond [then] [else] branch == [then] [else] true i i # if cond is true cond [then] [else] branch == [then] [else] false i i # if cond is false cond [then] [else] branch == [then] [else] cond i i # if cond is true cond [then] [else] branch == [then] [else] cond i i # if cond is false # from this point onward the reduction rule is valid for any value of cond cond [then] [else] branch == [then] [else] cond i i cond [then] [else] branch == cond [then] [else] dig2 i i cond [then] [else] branch == cond [then] [else] unit cons dip i i branch == unit cons dip i i
We now have a simple conditional. If the condition is true we execute a then-block, otherwise we execute an else-block. It would be more useful if we also had a conditional block, so to speak. I.e. a quotation that, when executed, yields a boolean. This is where comparators like
> would be useful.
In order to arrive at definitions for such comparators, we need to look at Church numbers first.
Where Church booleans were centred around optional execution, Church numbers are centred around multiple execution.
A quick note on terminology. I will use "numeral" and "number" interchangeably, but they are not the same thing. "Numeral" is syntax, "number" is semantics. Both the Arabic numeral "3" and the Roman numeral "III" can have the meaning of the number three in the right context.
The semantics we are looking for is this:
[A] 0 i == [A] 1 i == A [A] 2 i == A A [A] 3 i == A A A . . . [A] N i == A A A ... A
A definition for
0 that satisfies the above reduction rule is
0 == [pop]. Similarly,
1 == [i] will do as a definition for
2, things are slightly more complicated:
[A] 2 i == A A == [A A] i == [A] [A] cat i == [A] dup cat i == [A] [dup cat i] i 2 == [dup cat i]
We are hoping for some kind of pattern to emerge, so that we don't need a new definition for every
numeral. Let's take a look at
[A] 3 i == A A A == [A A A] cat i == [A A] [A] cat i == [A] [A] [A] cat cat i == [A] [A] dup cat cat i == [A] dup dup cat cat i == [A] [dup] 2 i [cat] 2 i i == [A] [[dup] 2 i [cat] 2 i i] i 3 == [[dup] 2 i [cat] 2 i i]
That gives us a definition of
3 in terms of its predecessor
2. We could expand the definition
all the way and replace
[dup cat i], but we won't gain much in doing so. Let's rather find out whether the pattern continues. I.e. can we define
4 in terms of its predecessor
3? It turns out that we can - the reader is invited to find a derivation - and it also turns out that the pattern holds for all
n > 0.
We now have a way to apply a quotation
n times, but that in itself doesn't really mean that
2 has the semantics of the number two. Not without operators.
As a stepping stone to addition, the first operator that we'll consider is
succ, the successor function.
We expect the following semantics from a successor function:
0 succ == 1 1 succ == 2 2 succ == 3 . . .
In order to derive a definition for
succ, we will exploit the fact that numbers are defined in
terms of their predecessors. That is, if
N is our starting point, then by carefully positioning and manipulating
N to arrive at
[[dup] N i [cat] N i i], we will have found a definition for
N succ == [[dup] N i [cat] N i i] . . . succ == unit [i] cat [[dup]] swap dup [[cat]] swap [i] cat cat cat cat
Addition is the repetition of succession. Adding m to n is equivalent to calculating the m*th successor of *n. This leads us to the following reduction rule:
N M + == N succ succ ... succ # succ repeated M times N M + == N [succ] M i N M + == N M [succ] swap i + == [succ] swap i
2 3 + 2 3 [succ] swap i 2 [succ] 3 i 2 succ succ succ 3 succ succ 4 succ 5
Multiplication can be defined in terms of addition. The product of n and m is equivalent to adding n m times to 0. That is, 2 times 3 is 0 + 2 + 2 + 2. We can generalize multiplication with the following reduction rule:
N M × == 0 N + N + ... N + # "N +" repeated M times N M × == 0 [N +] M i N M × == [N +] M 0 bury2 i N M × == N [+] cons M 0 bury2 i N M × == N M [[+] cons] dip 0 bury2 i × == [[+] cons] dip 0 bury2 i
2 3 × 2 3 [[+] cons] dip 0 bury2 i 2 [+] cons 3 0 bury2 i [2 +] 3 0 bury2 i 0 [2 +] 3 i 0 2 + 2 + 2 + 2 2 + 2 + 4 2 + 6
The derivation of the
pow function is very similar to that of
×. However, instead of repetitively adding a value to 0, we now rather multiply 1 repetitively by some value. That is 2^3 is equivalent to 1 times 2 times 2 times 2. Or, in general:
N M pow == 1 N × N × ... N × # N × repeated M times N M pow == 1 [N ×] M i . . . pow == [[×] cons] dip 1 bury2 i
2 3 pow 2 3 [[×] cons] dip 1 bury2 i 2 [×] cons 3 1 bury2 i [2 ×] 3 1 bury2 i 1 [2 ×] 3 i 1 2 × 2 × 2 × 2 2 × 2 × 4 2 × 8
So far, we've only dealt with functions that increase the value of a number. To get to subtraction, division, and roots, we need to start with
a predecessor function
pred. Here is what we expect from a predecessor function:
0 pred == 0 # I'll explain this later 1 pred == 0 2 pred == 1 3 pred == 2 . . .
pred using Church encoding is quite complicated. Here is how we will go about it. We want to start at 0 and repetitively add 1 to it, but instead of repeating this n times, we'll repeat it n - 1 times. In a sense, that just defers the problem, because now we have to figure out when to stop. The secret lies in applying some
function that will act like
succ n times, except for the very first application. This we can achieve by maintaining a boolean along with our numerical value that keeps track of whether we are at the first function application or not. This leads us to the following reduction rule:
N pred == 0 false [[succ true] [true] branch] N i
From which we can derive a definition:
pred == 0 false dig2 [[succ true] [true] branch] swap i pop
To illustrate how it works, let's consider
2 pred 2 0 false dig2 [[succ true] [true] branch] swap i pop 0 false 2 [[succ true] [true] branch] swap i pop 0 false [[succ true] [true] branch] 2 i pop 0 false [succ true] [true] branch [succ true] [true] branch pop 0 true [succ true] [true] branch pop 0 succ true pop 1 true pop 1
Note that the predecessor of 0 is 0. The encoding we are dealing with here only applies to the natural numbers, so
no negatives. Extending the encoding to integers and reals and even complex numbers is possible, but not within
the scope of this article.
pred, we can move on to subtraction (
N M - == N [pred] M i
- == [pred] swap i
So far we have looked at Church booleans and Church numbers. We even had to combine them to arrive at a definition for
pred. We now
turn to predicates. The first predicate we look at is
is-zero. In order to arrive at a definition for
is-zero, we will exploit the
fact that the numeral
0 can be used to apply some quoted function zero times, whereas any other numeral n, will apply the quoted
function n times. If we choose the quotation to be
[pop false] and start with a value of
true, then if we apply the function zero times,
we end up with
true, which is what we want. If on the other hand we apply the function once or more, we will pop the initial
and replace it with
false, only to pop
false and replace it with
false again, repeat n times. This leads us to the following reduction rules:
0 is-zero == true 1 is-zero == true pop false 2 is-zero == true pop false pop false 3 is-zero == true pop false pop false pop false . . . N is-zero == true [pop false] N i
is-zero == [true [pop false]] dip i
Also, since n - m = 0 when m is larger than or equal to n, we can also make the following definitions:
<= == - is-zero
>= == swap - is-zero
Finally, if n <= m and n >= m, then we know that n = m:
m n = == m n <= m n >= and m n = == [m n] i <= [m n] i >= and m n = == [m n] [m n] [i <=] dip i >= and m n = == [m n] [m n] [i <=] swap unit cat i i >= and m n = == [m n] dup [i <=] swap unit cat i i >= and m n = == m [n] cons dup [i <=] swap unit cat i i >= and m n = == m n unit cons dup [i <=] swap unit cat i i >= and = == unit cons dup [i <=] dip i >= and
We have now built up a sizeable arsenal of composite operators and operands. It is time to address division and root extraction.
The basic principle that we'll follow is that the quotient is the number of times that you can subtract the divisor from the dividend before you reach 0.
In other words
10 2 ÷ 5 = means that
10 [2 -] 5 i 0 = (subtracting 2 five times from 10 yields 0). In other words, in order to determine
N M ÷ we want to start with a quotient of 1 (
Q) and increment its value iteratively until the following no longer holds:
N [M -] Q i 0 >=
One way to achieve this is to use the
y combinator, which makes anonymous recursion possible. In the base case, when our condition no longer holds, we terminate and only leave the quotient on the stack. In the recursive case, we increment the quotient and recheck the condition. The resulting reduction rule for division is:
N M ÷ == 0 true [ bury2 [ succ dup [N [M -]] dip i 0 >= dig2 i ] [swap pop] branch ] y
The objective is to extract
M to the front of the right hand side. The result is a definition for
÷ == [ [[bury2] dip] dip [ [[succ dup ] dip] dip [[-] cons] cons cons dip i 0 >= dig2 i ] cons cons [swap pop] branch ] cons cons [0 true] dip y
Determining the *n*th root is very similar to division. But instead of repetitive subtraction, we now employ repetitive division. This time we increment the root
R until the following no longer holds:
N [M ÷] R i 1 >=
The reduction rule for root extraction looks like this:
N M √ == 0 true [ bury2 [ succ dup [N [M ÷]] dip i 1 >= dig2 i ] [swap pop] branch ] y
From which we can derive the definition of
√ == [ [[bury2] dip] dip [ [[succ dup ] dip] dip [[÷] cons] cons cons dip i 1 >= dig2 i ] cons cons [swap pop] branch ] cons cons [0 true] dip y
We started out with the ambition to devise an encoding using only basic functions by which the following would evaluate to true:
1 2 + 3 =
Let's put it to the test (taking a few shortcuts here and there):
1 2 + 3 = 3 3 = 3 3 unit cons dup [i <=] dip i >= and # definition of = 3  cons dup [i <=] dip i >= and [3 3] dup [i <=] dip i >= and [3 3] [3 3] [i <=] dip i >= and [3 3] i <= [3 3] i >= and 3 3 <= 3 3 >= and 3 3 - is-zero 3 3 swap - is-zero and 3 3 - is-zero 3 3 swap - is-zero and 0 is-zero 3 3 swap - is-zero and 0 is-zero 3 3 - is-zero and 0 is-zero 0 is-zero and 0 [true [pop false]] dip i 0 [true [pop false]] dip i and true [pop false] 0 i true [pop false] 0 i and true true and true true [dup] dip swap i # definition of and true dup true swap i true true true swap i true true true i true true [pop] i # definition of true true true pop true
Besides the ability to do simple arithmetic and logic, we can also put Church-encoding numerals to other use.
One such application lies in generalizing stack shuffling functions like
dig family of functions are used to dig values up from deep down on the stack.
dig1 is equivalent to
dig2 brings the second value below the top to the top. And so on. Here are definitions for some
dig1 ==  cons dip dig2 ==  cons cons dip dig3 ==  cons cons cons dip
Clearly, the deeper you want to dig, the more
cons functions are required. We can generalize
dig as follows:
dign ==  [cons] N i dip
bury family does the opposite. It buries the top of the stack. Once again,
bury1 is equivalent to
swap. Here are definitions for some
bury1 == [ cons] dip swap i bury2 == [ cons cons] dip swap i bury3 == [ cons cons cons] dip swap i
And likewise, we can generalize this to:
buryn === [ [cons] N i] dip swap i
dip == swap unit cat i y == [dup cons] swap cat dup cons i dign ==  [cons] N i dip buryn == [ [cons] N i] dip swap i
true == [pop] false == [swap pop] and == dup i or == [dup] dip swap i not == [false true] dip i nand == and not nor == or not
N == [[dup] M i [cat] M i i] succ == unit [i] cat [[dup]] swap dup [[cat]] swap [i] cat cat cat cat + == [succ] swap i × == [[+] cons] dip 0 bury2 i pow == [[×] cons] dip 1 bury2 i pred == 0 false dig2 [[succ true] [true] branch] swap i pop - == [pred] swap i is-zero == [true [pop false]] dip i ÷ == [[[bury2] dip] dip [[[succ dup ] dip] dip [[-] cons] cons cons dip i 0 >= dig2 i] cons cons [swap pop] branch] cons cons [0 true] dip y √ == [[[bury2] dip] dip [[[succ dup ] dip] dip [[÷] cons] cons cons dip i 1 >= dig2 i] cons cons [swap pop] branch] cons cons [0 true] dip y
N is any natural number and
M is the predecessor of
Conditionals and comparators:
branch == unit cons dip i i <= == - is-zero >= == swap - is-zero = == unit cons dup [i <=] dip i >= and