DEV Community

Danie Palm
Danie Palm

Posted on • Updated on

Proteins as programs - a Joy interpreter in Elixir

In the previous post I presented a parser written in Elixir for a subset of the Joy programming language. The natural next step is to develop an interpreter. When we're done, we will be able to parse and interpret Joy programs with the custom sigil ~J(...):

iex(1)> import Joy
Joy
iex(2)> ~J([cat] dup)
[[:cat], [:cat]]

Code

A programming language is said to be homoiconic if programs written in the language can be represented inside the code using the very data structures that are used to represent data in the language. In other words program = data. And in yet other words, the language makes use of "program literals" to present code in the same way that languages often make use of numerals and string literals to represent numbers and text. Programs in homoiconic languages can act on other programs or copies of itself as input and even contain (quoted) programs in its source code. But programs cannot contain full copies of themselves, not verbatim at least.

The notion of program = data is perhaps a bit misleading. Interpreters need to know when to work with a program as data and when to treat it as code. This is known as the use-mention distinction, which is nicely illustrated in this example from Wikipedia:

Use: Cheese is derived from milk.
Mention: "Cheese" is derived from the Old English word "ċēse".

Homoiconic languages therefore always have some way of quoting programs in order to indicate when they are "mentioned" (as data) as opposed to "used" (executed as code). More accurately then, we can say that quoted programs are data ("program" = data). Quotation usually involves some special syntax such as quotation marks, or brackets, but this is not a prerequisite; the position of the subprogram within the larger program could also imply quotation, such as in lambda calculus.

One of the most interesting applications of the use-mention distinction is found in quines. A quine is a computer program that takes no input and that returns its own source code as output (often printed to the standard output). The term "quine" was coined by Douglas Hofstadter in honour of Willard Quine, which devised this delightful use-mention paradox:

"yields falsehood when preceded by it quotation" yields falsehood when preceded by its quotation

The truth-value of the the above statement is undecidable. It cannot be shown to be true or false. Kurt Gödel used a similar construct to show that there are formally undecidable statements in number theory. In doing so, he needed to find a way to quote mathematics inside mathematics. This he accomplished with Gödel numbering.

Quines can be devised in any language that is Turing complete. But the solution is not always elegant. Rosetta Code maintains a page of quine implementations in various languages. Here are the Elixir and Joy solutions from Rosetta Code:

a = <<"a = ~p~n:io.fwrite(a,[a])~n">>
:io.fwrite(a,[a])
"dup put putchars 10 putch" dup put putchars 10 putch

Quines always exist of a use part and a mention part. In the Elixir example above, part of the program is mentioned (as a string literal) and the same is then used to actually print the output.

A particularly elegant representation of the quine is found in lambda calculus:

(λx.xx)(λx.xx)

This is known as the Omega combinator and is a repetition of the Mockingbird combinator λx.xx as popularized by Raymond Smullyan in his book To Mock a Mockingbird. The second occurrence of λx.xx is the "mention" part of the quine, whereas the first occurrence is the "use" part. The Omega combinator does not terminate. It evaluates to itself, which evaluates to itself. And so on. The elegance of this quine originates in the fact that lambda calculus is homoiconic. So, can we do any better in Joy, which is also homoiconic?

Indeed we can. In Joy, dup is close to the concatenative counterpart to the Mockingbird combinator. So what if we apply dup to itself?

[dup] dup

Note that [dup] acts as the "mention", part whereas the following dup is the "use" part. It evaluates to:

[dup] [dup]

But that is not quite what we started off with. We need the first occurrence of dup to be quoted, but not the second one. We can accomplish this with cons:

[dup cons] dup cons

Let's evaluate it:

[dup cons] dup cons        (initial program)
[dup cons] [dup cons] cons (applied dup)
[[dup cons] dup cons]      (applied cons)

It evaluates to [[dup cons] dup cons], which is the quoted form of what we started off with. There is a bit of a subtlety here. Technically we started with [[dup cons] dup cons] which we then interpreted/executed to yield [dup cons] dup cons and so on. We will encounter this same subtlety in the Elixir Joy interpreter developed below.

The exact counterpart of the Mockingbird combinator can be defined as follows:

[A] m == [A] A

If we then proceed to mock the Mockingbird, we end up with a program that endlessly produces itself:

[m] m

The Mockingbird combinator can also be defined in terms of some more familiar combinators that we've already encountered:

m == dup cons i

On a side note. The famous Y combinator can be defined like this in Joy:

y == [dup cons] swap cat dup cons i

Then passing the empty quoted program [] to it, yields our quine:

[] y
[] [dup cons] swap cat dup cons i (definition)
[dup cons] [] cat dup cons i      (swapped)
[dup cons] dup cons i             (catted)
[dup cons] [dup cons] cons i      (dupped)
[[dup cons] dup cons] i           (consed)
[dup cons] dup cons               (interpreted)
...
[[dup cons] dup cons]             (as previously)

Just as the quine duplicates itself without directly referencing itself. The Y combinator can be used to achieve anonymous recursion: it can turn functions into recursive versions of themselves without having to be able to reference themselves.

Biology

(Bio)chemistry is the original homoiconic calculus. Molecules (in their capacity as code) act on other molecules (in their capacity as data). DNA often plays the "mention" part, whereas proteins play the "use" part.

There is a strong connection between quines and closure to efficient causation. In order to reproduce, life forms first copy their DNA and then execute the instructions in their DNA.

Living systems can be considered to be very sophisticated quines:

  • quoting corresponds to representing proteins as DNA encoded according to the genetic code
  • interpreting corresponds to the chemical processes by which some molecules act on others

But there is a fundamental difference between a typical quine and between an artificial life system. In the Joy quine [dup cons] dup cons there is a pre-ordained mapping between the quoted functions and their unquoted counterparts. And the translation of this mapping happens in the interpreter. I.e. it is deterministic. In biology, however, the genetic code itself is encoded in DNA. The mapping is therefore arbitrary and interpreted by the cell itself. The mapping can even change.

Code

The Joy program [dup cons] dup cons is represented by the Elixir list [[:dup, :cons], :dup, :cons]. Note the extra brackets that are present in Elixir. That is, programs essentially exist in a quoted form until they are executed.

Joy programs can be executed with the following function:

def __execute(stack, program) when is_list(stack) and is_list(program) do
  Enum.reduce(program, stack, fn
    function, stack when is_atom(function) ->
        Kernel.apply(__MODULE__, function, [stack])

    quotation, stack when is_list(quotation) ->
      [quotation | stack]
  end)
end

It takes a stack and a program as arguments. Both are represented as lists. The function initializes an accumulator with the stack and then iterates over the elements in the program, updating the accumulator (stack) on each step. The final argument of Enum.reduce is an anonymous function that on every step receives the element from the program, and the stack. If the element is a function, represented by an atom (when is_atom(function), the function is applied with the stack as the only member of the list of arguments. If, on the other hand, the element is a quotation, represented by a list (when is_list(quotation)), it is simply added to the top of the stack.

This is the essence of interpreting a Joy program. But the details lie in the implementations of the individual functions invoked by the Kernel.apply(__MODULE__, function, [stack]) step. Let's look at these in more detail:

Kernel.apply/3 expects three arguments: a module, a function, and a list of arguments. __MODULE__ simply evaluates to the current module. Functions are represented by atoms, and the arguments are simply a list.

Kernel.apply(__MODULE__, :dup, [stack]) is therefore the same as __MODULE__.dup(stack). In other words, the __execute function expects the function dup/1 to be defined in the same module. Here is the definition of dup/1.

def dup(stack) do
  # Extract the head from the stack
  [head | stack] = stack
  # Put the extracted head back on the stack, twice
  [head, head | stack]
end

Remember that in Elixir the result of the last line in a function is returned as the result of the function.

Here is the full interpreter:

defmodule Joy.Interpreter do
  def __execute(stack, program) when is_list(stack) and is_list(program) do
    Enum.reduce(program, stack, fn
      function, stack when is_atom(function) ->
        Kernel.apply(__MODULE__, function, [stack])

      quotation, stack when is_list(quotation) ->
        [quotation | stack]
    end)
  end

  def swap(stack) do
    [a, b | rest] = stack

    [b, a | rest]
  end

  def dup(stack) do
    [head | stack] = stack

    [head, head | stack]
  end

  def zap(stack) do
    [_ | stack] = stack

    stack
  end

  def unit(stack) do
    [a | rest] = stack

    [[a] | rest]
  end

  def cat(stack) do
    [a, b | rest] = stack

    case {a, b} do
      {a, b} when is_list(a) and is_list(b) -> [b ++ a | rest]
    end
  end

  def cons(stack) do
    [a, b | rest] = stack

    case {a, b} do
      {a, b} when is_list(a) -> [[b | a] | rest]
    end
  end

  def i([a | rest] = _stack) when is_list(a) do
    __execute(rest, a)
  end

  def dip([a, b | rest] = _stack) when is_list(a) and is_list(b) do
    [b | __execute(rest, a)]
  end
end

Finally, let's define a custom sigil that will parse and interpret a Joy program:

defmodule Joy do
  def sigil_J(string, _) do
    string
    |> Joy.Parser.parse()
    |> Joy.Interpreter.__execute()
  end
end

In closing, there are many levels of quoting and unquoting (interpreting) in biology. In a certain sense, the translation of DNA to proteins is a form of interpreting. And on another level, the folding of proteins from their inactive primary structure to their active tertiary or quaternary structures also represent interpreting. We shall encounter these concepts again.

Further reading

Reproducing programs by Manfred von Thun.

Discussion (0)