loading...

Programs as proteins - a ribosome programmed in Joy

palm86 profile image Danie Palm ・13 min read

Ribosomes are enzymes that are part protein and part RNA. They play a central role in the translation of messenger RNA to proteins according to the genetic code. And yet all the protein subunits of ribosomes are themselves the products of messenger RNA that was translated according to the genetic code by... a ribosome. Ultimately, the messenger RNA that codes for the protein subunits and the ribosomal RNA that directly contributes to the translation function of the ribosome are both the products of DNA transcription, so that the ribosome, like other enzymes, are fully represented in the genome. Without the ribosome there would be no interpreter of the genetic code and no point in having any genes. The ribosome is therefore the most suitable starting place for our Joy-based artificial life system.

We have previously touched on the Y combinator, a function that is able to produce a fixed point of other functions (or function compositions, such as entire Joy programs). In this post we will finally put the Y combinator and its anonymous recursion to good use when we loop through the catalytic steps of the ribosome.

Biology

Ribosomes are roughly half rRNA and half protein. The rRNA component is made up of 4 rRNA strands and the protein component comprises about 80 proteins. The exact composition differs between bacteria, archaea, and eukaryotes, but the overall function and structure is largely comparable: all ribosomes are made up of a small and a large subunit, both of which is a mix of rRNA and proteins.

The process of translation begins when the small ribosomal subunit binds to mRNA, either at the start (the 5'-end) or just prior to the start codon (AUG). It then scans towards the 3'-end until it reaches the start codon, at which point the large ribosomal subunit is also recruited to form the full ribosome.

Once the ribosomal complex is fully initiated, an amino acid-charged tRNA that is complementary to the start codon of the mRNA strand binds to a position called site A. Right after binding, the ribosome moves one codon to the right, so that the amino acid-charged tRNA is now located at a position known as site P. The next codon of the mRNA is now located at site A and once again a complementary charged tRNA binds to it. The amino acid at site P is now transfered to the amino acid at site A, and then the ribosome once again moves one codon to the right. Site P is now bound to a tRNA molecule that is charged with a peptide chain of two amino acids. This process continues until a stop codon is reached and on each iteration the peptide chain at site P grows by one amino acid, building up a protein of which the amino acid sequence is dictated by the mRNA strand being translated.

It is somewhat interesting to note that the protein subunits of the ribosome are not actively involved at the translation sites, but rather seem to support the rRNA that catalyses the elongation reaction. This could perhaps be an indication that primitive ribosomes were entirely made up of rRNA.

Code

Let's dive right in and take a look at the high-level logic that we want from a ribosome in the form of an Elixir function. The function takes a polypeptide (protein) list as the first argument and an mRNA list as the second. The first invocation should be with an empty polypeptide and an mRNA list of any length. It calls some undefined functions of which the names are hopefully self-explanatory.

def ribosome_worker(polypeptide, mrna) do
    if Enum.count(mrna) >= 3 do
        if polypeptide == [] do
            {codon, mrna} = Enum.split(mrna, 3)

            if codon == [:a, :u, :g] do
                # Initiate translation
                amino_acid = translate(codon)
                polypeptide = polypeptide ++ amino_acid

                ribosome_worker(polypeptide, mrna)
            else
                # Continue searching for start codon (AUG)
                [_first_base | last_two_bases] = codon
                mrna = last_two_bases ++ mrna

                ribosome_worker(polypeptide, mrna)
            end
        else
            # Elongate previously initiated polypeptide
            {codon, mrna} = Enum.split(mrna, 3)
            amino_acid = translate(codon)

            if amino_acid == [] do
                # the proper Elixir counterpart of what we're doing here
                # detracts from the argument, so we move along instead
                stash_current_polypeptide_somewhere_and_start_a_new_one()
            else
                polypeptide = polypeptide ++ amino_acid
            end

            ribosome_worker(polypeptide, mrna)
        end
    else
        polypeptide
    end 
end

In short, we want to scan over a messenger RNA (mRNA) strand (a list of atoms :a, :c, :g or :u) for as long as it is longer than or equal to 3 bases. As soon as the sublist [:a, :u, :g] is reached, protein synthesis is initiated. This initiation involves translating the AUG codon to its corresponding amino acid (methionine in this case), which forms the first amino acid of the polypeptide (protein) chain. After initiation we recurse. This time however, the polypeptide is not empty and so we don't search for the initiating AUG codon, but instead translate the next codon and attach the corresponding amino acid to the end of the growing polypeptide. Unless the codon corresponds to a stop codon, returning an empty amino acid, in which case the existing polypeptide chain is stashed (code for this is not shown) and a new chain is started. In either case, we once a again recurse, so that we either elongate the polypeptide or search for a new start codon. This process continues until the mRNA strand has less than 3 bases left, in which case it can contain no more codons so that the translation process is terminated.

The sequence as described above does not correspond exactly to the real process of translation of which there are many variations. For instance, when a ribosome encounters a stop codon, it detaches from the mRNA strand instead of continuing to search for the next occurence of AUG.

The problem we now face, is to turn the above pseudocode into Joy. Some of the challenges we encounter in this process are:

  • Joy, even without the restrictions we place on it, does not have the notion of variables
  • the ribosome_worker function is recursive, but since it is not one of our primitives (like dup, swap, etc.) there is no direct way to make a recursive call in the Joy version
  • we are using a minimal subset of Joy in which we don't have numbers
  • the pseudocode is in a mixture of Polish and infix notation, but Joy exclusively makes use of the reverse Polish notation

Joy doesn't have variables. Functions can only manipulate or act on values on the stack. Most Joy primitives only operate on the topmost one or two stack values. Before executing a function, it is therefore usually necessary to first manipulate the stack to make sure that the values expected by the function are located at the correct stack positions. Similarly, it may be necessary to do some stack cleanup after a function has been executed. In other words, it is sometimes necessary to dig out values from deep in the stack and place them on top, and other times it is necessary to bury values from the top of the stack deep underneath some others. Let's look at a family of such dig and bury functions.

We are already familiar with a function that will bury the top element (and simultaneously dig out the second element) - swap. But it is possible to bury the top element n positions deep with:

bury1 == [[] cons] dip swap i
bury2 == [[] cons cons] dip swap i
bury3 == [[] cons cons cons] dip swap i

in which cons is repeated n times.

For example:

[A] [B] [C] bury2 == [C] [A] [B]

Here is how it works:

[A] [B] [C] bury2
[A] [B] [C] [[] cons cons] dip swap i     (definition)
[A] [B] [] cons cons [C] swap i           (dip)
[A] [[B]] cons [C] swap i                 (cons)
[[A] [B]] [C] swap i                      (cons)
[C] [[A] [B]] i                           (swap)
[C] [A] [B]                               (i)

We cons up two items into a single list. Then we swap the list with the top element and unpack the consed up list again. Similary, we can dig out n elements with:

dig1 == [] cons dip
dig2 == [] cons cons dip
dig3 == [] cons cons cons dip

Note that dig1 and bury1 are both equivalent to swap. See The Theory of Concatenative Combinators for a more detailed treatment. We will not be using dign or buryn directly because we generally want to employ as few as possible amino acids. Instead, we will be writing out the bury and dig family of functions in full whenever they are needed.

Let's put this to the test. Suppose we have a function that takes 3 parameters: a number n and two strings even_string and odd_string. The function must print the value of even_string if n is even and the value of odd_string otherwise. In Elixir, it could look like this:

def odd_or_even(n, even_string, odd_string) do
  if rem(n, 2) == 0 do
    IO.puts(even_string)
  else
    IO.puts(odd_string)
  end
end

A corresponding Joy function, allowing the use of numbers and strings, could look like this:

[odd_or_even]
[
    [
        # check if the remainder of n divided by 2 is 0
    ]
    [
        # print odd string
    ]
    [
        # print even string
    ]
    ifte
]
define

Remember that ifte is the Joy counterpart of an if-then-else block. Now, the remainder check could look like:

2 rem 0 equal

This assumes that the value representing n is at the top of the stack. However, if the function is to be called like 4 "I am odd" "I am even" odd_or_even, then n is two places below the top and needs to be dug up first.

Similarly, for the case in which we want to print the value of even_string, we can't just write puts (the Joy built-in that writes to the standard output), because that would expect even_string to be at the top of the stack. Instead we have to dig once (or swap) to get it. Lastly, in order to print the value of odd_string, we can simply use puts because odd_string is indeed at the top. Here is the full Joy version:

[odd_or_even]
[
    [
        dig2          # dig up n
        2 rem 0 equal # check if n is even
    ]
    [
        swap puts     # print even_string if n is even
    ]
    [
        puts          # print odd_string otherwise
    ]
    ifte
]
define

Now that we know how to manipulate the stack, it is time to tackle recursion. Consider the following recursive function (assuming only non-negative values of n):

def sum(total, n) do
    if n == 0 do
        IO.puts(total)
    else
        sum(total + n, n - 1)
    end
end

Calling sum(0, 3) should print out the number 6. Here is the corresponding Joy version, with all the necessary digging and burying:

[sum]
[
    [0 equal] # n is at the top, so no digging
    [
        swap  # dig out total, which is below n
        puts  # ... and print it
    ]
    [
        dup   # duplicate n, which is at the top
        dig2  # dig out total which is now buried under two n's
        +     # sum total and the n below it
        swap  # so that the new total is under the n that remains from the earlier dup
        1 -   # subtract 1 from the n on top
        sum   # recurse
    ]
    ifte
]
define

The else-block that makes the recursive call is particularly involved, because it needs to first manipulate the stack so that it is ready for the first calculation total = total + n and then to prepare for the secod calculation n = n - 1. And finally it has to place total and n on the stack in the order that is expected by the sum function in preparation for the recursive call.

While the function above works, we are interested in a version of it that uses anonymous recursion. That is, instead of directly calling sum recursively, we need to execute something that is equivalent but unnamed. In a previous post, we have established the following effect of the Y combinator:

[p] y == [q] p

where [q] (short for [[dup cons p] dup cons p]), when executed, again yields [q] p. This means that if the program p decides to execute q by for instance invoking i when [q] is on top of the stack, then the result is that [q] is once again placed on top of the stack and p is once again executed, having again the option to unquote [q] whenever its logic requires it to do so. In this way p can effectively call itself anonymously.

We can now modify the definition of sum to make use of the Y combinator as follows:

[sum]
[
    [
        [dig1 0 equal] # [q] is now above n, so we have to dig n out
        [
            dig2       # dig out total, which is below n and [q]
            puts       # ... and print it
        ]
        [
            dig1 dup   # dig out and duplicate n
            dig3       # dig out total which is now buried under two n's and [q]
            +          # sum total and the n below it
            bury2      # bury the new total under [q]
            1 -        # subtract 1 from the n that is now on top
            bury1      # bury n under [q]
            i          # recurse: execute [q] which is on top
        ]
        ifte
    ] y
]
define

We have now directly or indirectly addressed all the challenges we faced in translating pseudcode to restricted Joy. Perhaps a final note on the absence of numbers is warranted. While we won't support numbers (there are too many of them and we aim to use as few symbols as possible), there is no limitation on performing certain operations a number of times. For instance, the predicate Enum.count(mrna) >= 3, can be calculated without using numbers:

not Enum.empty?(mrna) and not Enum.empty?(tl(mrna)) and not Enum.empty?(tl(tl(mrna)))

which is readily converted to Joy.

Without further ado, here is our ribosome, first with high-level functions like buryn and dign and itermediate definitions, and then in austere form using only primitives.

High-level ribosome:

[ribosome]
[
    [] swap
    [ribosome-worker]
    y
]
define
[ribosome-worker]
[
    # check if the mrna list contains at least 3 items
    [swap empty] [ ] [[swap rest empty] [ ] [[swap rest rest empty] [ ] [
        [dig2 empty]            # if polypeptide is empty
        [                       # then initiate
            swap split3 swap    # dig up mrna and split at 3

            [[a u g] equal]     # if codon equals [a u g]
            [                   # then
                translate       # translate codon to amino acid
                dig3            # dig out polypeptide
                swap cat        # dig out amino acid and concatenate to polypeptide
                bury2           # bury polypeptide
                swap            # bury mrna below [q]
                i               # recurse
            ]
            [                   # else
                rest            # drop the first element of the codon
                swap cat        # prepend the rest to mrna 
                swap            # bury mrna below [q]
                i               # recurse
            ]
            ifte
        ]
        [                       # else elongate
            swap split3 swap    # dig up mrna and split at 3
            translate           # translate codon to amino acid

            [empty]             # if amino acid is empty
            [                   # then
                bury2           # bury it as a new polypeptide
            ]
            [                   # else
                dig3            # dig out polypeptide
                swap cat        # dig out amino acid and concatenate to polypeptide
                bury2           # bury polypeptide
            ]
            ifte

            swap                # bury mrna below [q]
            i                   # recurse
        ]
        ifte
    ] ifte] ifte] ifte
]
define
[split3]
[
    [] swap
    dup [first unit cat] dip rest
    dup [first unit cat] dip rest
    dup [first unit cat] dip rest
]
define

The split3 function is the counterpart of the Elixir function Enum.split(mrna, 3). Its first effect is to bury an empty list underneath the mRNA, which it expects to be on top of the stack. It then duplicates the mRNA and concatenates the head of the first copy to the list underneath it on the stack, and then it drops the head of the second copy. This duplication process is repeated 3 times to correspond to the 3 in split3.

Austere, but verbose ribosome:

[ribosome]
[
    [] swap
    [
        [swap empty] [ ] [[swap rest empty] [ ] [[swap rest rest empty] [ ] [
            [[] cons cons dip empty]
            [
                swap
                [] swap
                dup [first unit cat] dip rest
                dup [first unit cat] dip rest
                dup [first unit cat] dip rest
                swap

                [[a u g] equal]
                [
                    translate
                    [] cons cons cons dip
                    swap cat
                    [[] cons cons] dip swap i
                    swap
                    i
                ]
                [
                    rest
                    swap cat
                    swap
                    i
                ]
                ifte
            ]
            [
                swap
                dup [first unit] dip rest
                dup [first unit cat] dip rest
                dup [first unit cat] dip rest
                swap
                translate

                [empty]
                [
                    [[] cons cons] dip swap i
                ]
                [
                    [] cons cons cons dip
                    swap cat
                    [[] cons cons] dip swap i
                ]
                ifte

                swap
                i
            ]
            ifte
        ] ifte] ifte] ifte
    ]
    [dup cons] swap cat dup cons i pop pop
]
define

We've sneaked in a few things not present in the initial Elixir code, but only so that we don't have to provide the initial empty polypeptide to the function. I.e. we can call ribosome with just an mRNA sequence:

[c a u g a a c a u u g a a a u g u g a a a a a u g a a a c] ribosome

The output of the above program are two lists of amino acids (a stop codon in the mix triggered the start of a new polypeptide):

[met asn ile glu met] [met lys]

So the Joy ribosome does what we set out to do at the beginning of this post, but it is not quite what we ultimately want and I've slipped in some cheating. Here are some things that still need to be addressed:

  1. The ribosome translates mRNA to amino acids according to the real genetic code, whereas we really want it to translate mRNA to primitive Joy functions or combinators (artificial amino acids) according to an artificial genetic code.

  2. The ribosome can output multiple polypeptides, but these peptides are hopelessly flat - they contain no quotations. In essence the produced lists correspond to unfolded (primary structure) proteins, but Joy programs without quotations would hardly be capable of any interesting behaviour.

  3. I've treated the translate function as a primitive of Joy. This explains why I haven't provided its definition above. I wasn't able to provide it, because it is only defined at a language level, i.e. in the underlying Elixir implementation of Joy. The end result is that I've hardcoded the genetic code, injecting it from the outside, instead of representing it within the system. For completeness, here is the definition of translate:

  def translate(stack) do
    [head | rest] = stack

    amino_acid =
      case head do
        [:u, :u, :u] -> [:phe]
        [:u, :c, :u] -> [:ser]
        [:u, :a, :u] -> [:tyr]
        [:u, :g, :u] -> [:cys]
        [:u, :u, :c] -> [:phe]
        [:u, :c, :c] -> [:ser]
        [:u, :a, :c] -> [:tyr]
        [:u, :g, :c] -> [:cys]
        [:u, :u, :a] -> [:leu]
        [:u, :c, :a] -> [:ser]
        [:u, :a, :a] -> [] # stop codon
        [:u, :g, :a] -> [] # stop codon
        [:u, :u, :g] -> [:leu]
        [:u, :c, :g] -> [:ser]
        [:u, :a, :g] -> [] # stop codon
        [:u, :g, :g] -> [:trp]
        [:c, :u, :u] -> [:leu]
        [:c, :c, :u] -> [:pro]
        [:c, :a, :u] -> [:his]
        [:c, :g, :u] -> [:arg]
        [:c, :u, :c] -> [:leu]
        [:c, :c, :c] -> [:pro]
        [:c, :a, :c] -> [:his]
        [:c, :g, :c] -> [:arg]
        [:c, :u, :a] -> [:leu]
        [:c, :c, :a] -> [:pro]
        [:c, :a, :a] -> [:gln]
        [:c, :g, :a] -> [:arg]
        [:c, :u, :g] -> [:leu]
        [:c, :c, :g] -> [:pro]
        [:c, :a, :g] -> [:gln]
        [:c, :g, :g] -> [:arg]
        [:a, :u, :u] -> [:ile]
        [:a, :c, :u] -> [:thr]
        [:a, :a, :u] -> [:asn]
        [:a, :g, :u] -> [:ser]
        [:a, :u, :c] -> [:ile]
        [:a, :c, :c] -> [:thr]
        [:a, :a, :c] -> [:asn]
        [:a, :g, :c] -> [:ser]
        [:a, :u, :a] -> [:ile]
        [:a, :c, :a] -> [:thr]
        [:a, :a, :a] -> [:lys]
        [:a, :g, :a] -> [:arg]
        [:a, :u, :g] -> [:met]
        [:a, :c, :g] -> [:thr]
        [:a, :a, :g] -> [:lys]
        [:a, :g, :g] -> [:arg]
        [:g, :u, :u] -> [:val]
        [:g, :c, :u] -> [:ala]
        [:g, :a, :u] -> [:asp]
        [:g, :g, :t] -> [:gly]
        [:g, :u, :c] -> [:val]
        [:g, :c, :c] -> [:ala]
        [:g, :a, :c] -> [:asp]
        [:g, :g, :c] -> [:gly]
        [:g, :u, :a] -> [:val]
        [:g, :c, :a] -> [:ala]
        [:g, :a, :a] -> [:glu]
        [:g, :g, :a] -> [:gly]
        [:g, :u, :g] -> [:val]
        [:g, :c, :g] -> [:ala]
        [:g, :a, :g] -> [:glu]
        [:g, :g, :g] -> [:gly]
      end

    [amino_acid | rest]
  end

To end with, the Joy programs I've presented here, while very pleasing to me, are by no means readable; not to the untrained eye. I should remind the reader that firstly I'm not an expert in Joy and am probably not using the most idiomatic code and secondly that I'm purposefully using tediously repetitive code in order to limit the number of different functions that are used. Anonymous recursion is for instance built into the standard Joy libraries, which I'm not using. Using real Joy is probably much closer to real joy.

Be that as it may. We have now built a workable first iteration of the most important enzyme of our artificial life system. In the next few posts we will focus on ironing out the remaining three issues with this implementation. This will bring us to a critical milestone: having a ribosome that can produce itself given an appropriate mRNA template.

Posted on by:

Discussion

pic
Editor guide