Predicting the tertiary structure and function of a protein, given only the primary amino acid sequence, is one of the big open questions in biology. Some proteins fold entirely on their own, but most multi-domain proteins rely on so called chaperones, proteins that assist other proteins to fold.
In this post we develop a scheme by which non-functional flat Joy programs (enzymes in our artificial life system) can be converted into functional structured Joy programs with the help of chaperone programs. Such a scheme is required to turn Joy programs produced by the process of translation into fully fledged artificial enzymes.
Anfinsen's dogma states, broadly speaking, that the tertiary structure of a protein in a physiological cellular context is determined completely by its primary structure (its amino acid sequence). Deducing a protein's tertiary structure from its primary structure is, however, one of the big open questions of molecular biology, and has given rise to Levinthal's paradox, which roughly states that even for simple proteins the universe isn't old enough to allow for the sequential traversal of conformational space in order to reach the lowest energy conformation.
Of course, in cells, proteins fold in microsecond to millisecond time scales. Clearly the physiological context within which proteins fold significantly reduce the search space.
Most multidomain proteins actually rely on other proteins, called chaperones to help them fold. Some chaperones do speed up the folding process, but mostly they prevent aggregation of partially folded proteins. As such, these chaperones don't violate Anfinsen's dogma because they don't lower the final free energy state of the folded protein.
However, the native structure of some proteins is less stable than their unfolded structures, requiring chaperones to guide them into their functional conformation. In these cases Anfinsen's dogma is in fact violated.
Anfinsen's dogma perhaps overemphasizes the role of the protein primary structure. While the primary structure is undoubtedly of paramount significance, the intracellular context or intracellular milieu, which includes factors such as the pH, ionic strength, and chaperones, is an indispensible contributor to the functional folding of proteins.
It is also of significance to note that the intracellular milieu is actively maintained by the cell itself using active cross-membrane transport and strict regulation of the expression of chaperones.
In some sense therefore, the intracellular milieu can itself be regarded as an "enzyme" that catalyses the transformation of polypeptide primary structures into polypeptide tertiary structures. And this "enzyme" is itself the product of translation and protein folding. The role of the intracellular milieu is unpacked in more detail in the article Basic biological anticipation by Jannie Hofmeyr.
In the previous post we have developed the first iteration of a ribosome, programmed in a subset of Joy, that is able to convert an mRNA strand (a flat list of RNA bases) to one or more polypeptides (flat lists of amino acids).
One of our ultimate goals is to produce polypeptides that are made up of artificial amino acids in the form Joy functions so that they can operate as artifical enzymes in the form of Joy programs. Most useful Joy programs are however highly structured compared to the flat programs produced by our ribosome.
Here we devise a scheme by which such unstructured Joy programs can be converted into their fully functional structured counterparts by means of a chaperone Joy program.
It turns out that a crucial part of the work has already been done by Joy language enthusiasts (see Floy - a flat concatenative subset of Joy). The Joy language community posed the question as to whether it would be possible to create a subset of Joy, called Floy, that only allows for functions and empty or single-element lists and no nesting of lists.
It was subsequently shown that any Joy program
P can be converted by a flattening program
j2f into its flat counterpart Floy program
F such that:
[J] j2f == [F]
[F] results in
[J], which needs to be evaluated a second time should one want to execute the original program
[F] i == [J] [F] i i == J
There is more than one way to implement
j2f, but without going into the details, we will be using the "forwards" variant from the article linked above. It can be defined like this:
[j2f-f] [ [ list ] [ [[] cat] dip [j2f-f] step [ cons cat] cat ] [  cons [cat] cons cat ] ifte ] define [j2f] [ [] swap [j2f-f] step ] define
(Remember that this
define function is my own creation and not part of standard Joy.)
Given a program
P defined as
[1 2 3] [dup *] map
the output of
[P] j2f is
[   cat  cat  cat  cons cat  [dup] cat [*] cat  cons cat [map] cat]
With the appropriate amount of eye strain it can be seen that there is a mapping between the symbols in
[P] (left) and the symbols in
[ ->  ] ->  cons cat function -> [function] cat
function is any of the Joy functions/combinators, including numbers and other data types. The only symbol present in
[F] that is not represented in
[P] is an initial
. We will address this discrepancy is due time.
F is flat in the sense that it contains no nested quotations and all quotations are either empty or only contain single elements. Despite being flat,
F still contains symbols, all the
]s, that cannot be produced as individual elements of the polypeptide sequence produced by the ribosome developed in the previous post. I.e. there are no codons that map to either
]. And it is in fact not possible to devise such codons.
What we really need is a mapping, similar to the one above, but in which all the symbols on the left hand side are functions. Here is a mapping that will do:
bra ->  ket ->  cons cat function -> [function] cat
ket are identity functions or functions with any inconsequential effects. If we can now devise a
chaperone Joy program that takes a list of Joy functions (artificial amino acids) and map each one according to the mapping above, then we have a scheme by which we can turn completely flat (no lists at all) Joy programs into kind of flat Joys programs (ones like
F that only contain empty or unit lists), which when executed yield fully structured Joy programs:
ribosome chaperone mRNA ---------> unfolded polypeptide ----------> folded polypeptide
Or in Joy:
[mrna] ribosome chaperone == folded_protein
As a first step, here is a chaperone programmed in Elixir. Note that I'm deliberately refraining from using idiomatic Elixir and high-level functions like
Enum.reduce to facilitate easier translation to Joy.
def chaperone_worker(protein \\ , polypeptide) do if Enum.empty?(polypeptide) do protein else [head | rest] = polypeptide protein = if head == :bra do protein ++ [] else if head == :ket do protein ++ [, :cons, :cat] else protein ++ [[head], :cat] end end chaperone_worker(protein, rest) end end
chaperone_worker function above covers the core mapping, but does not include the initial extra
 that needs to be inserted and also does not unquote or execute its results so as to yield a final quoted "protein". We will use a wrapper function to accomplish these final touches, but here is the corresponding Joy
[chaperone-worker] [ # We expect the stack to look like this from bottom to top: # protein polypeptide [q] [swap empty] # if the polypeptide is empty  # do nothing [ swap # dig out polypeptide dup [rest] dip first # split into rest and first [bra equal] # if head equals bra [ # then pop # get rid of bra (!) [] # put mapped value on the stack ] [ # else [ket equal] # if head equals ket [ # then pop # get rid of ket [ cons cat] # put mapped value on the stack ] [ # else unit # wrap function in quotes [cat] cons # cons it into a list containing cat ] ifte ] ifte dig3 # dig out protein swap # dig out mapped value cat # append mapped value to protein bury2 # bury protein below rest of polypeptide and [q] swap # bury polypeptide below [q] i # recurse ] ] define
And here is the full
[chaperone] [  swap # start with an empty protein [chaperone-worker] y # convert worker to recursive function pop # remove the empty polypeptide  swap cons # introduce initial  into protein i # unquote to fully structured protein ] define
As in the previous post, we don't want intermediate functions and chunked definitions like
dign and will replace all definitions recursively until they are irreducible. But there is nothing to be gained from doing that now.
Let's rather look at an example. Suppose the following program is the output of a
[bra bra ket swap dup ket i]
ribosome function produced this from an input mRNA sequence according to an artificial genetic code that we haven't defined yet. The
chaperone function now converts this to an intermediate form (a Floy program):
[    cons cat [swap] cat [dup] cat  cons cat [i] cat]
And then, right before releasing it, it converts it to its final form (a Joy program again) by unquoting it with
[    cons cat [swap] cat [dup] cat  cons cat [i] cat] i     cons cat [swap] cat [dup] cat  cons cat [i] cat # by i   [] cat [swap] cat [dup] cat  cons cat [i] cat # by cons  [] [swap] cat [dup] cat  cons cat [i] cat # by cat  [ swap] [dup] cat  cons cat [i] cat # by cat  [ swap dup]  cons cat [i] cat # by cat  [[ swap dup]] cat [i] cat # by cons [[ swap dup]] [i] cat # by cat [[ swap dup] i] # by cat
We now have a quoted protein
[[ swap dup] i]
which may as well have been a quoted
chaperone, had we provided the appropriate input. It is tempting to draw parallels between the primary, secondary, and tertiary structure of proteins and the three stages of Joy programs we see here. The primary structure corresponds to:
[bra bra ket swap dup ket i]
The secondary structure corresponds to the intermediate step, which is never directly observed:
[    cons cat [swap] cat [dup] cat  cons cat [i] cat]
And finally the tertiary structure corresponds to:
[[ swap dup] i]
There is clearly a strong link between the primary and tertiary forms:
[ bra bra ket swap dup ket i] [ [ [ ] swap dup ] i]
which highlights the role of the primary structure in determining the tertiary structure, but this fidelity is at the mercy of the chaperone, which we have devised here to honour the mapping. The chaperone is however free to evolve (at least it will be) and could therefore conceivably deviate from this strict mapping.
Just as the intracellular milieu, of which chaperones could be considered to form a part, actively participates in protein folding in real cells, we have here devised a scheme by which a programmatic
chaperone facilitates the construction of higher-order Joy functions from flat primary templates.
In coming posts, we will define an artificial genetic code and update the previously developed ribosome to make use of this code. This will then allow us to produce quoted ribosomes and chaperones from appropriately crafted mRNA templates:
[ribosomal-mrna] ribosome chaperone == [ribosome] [chaperone-mrna] ribosome chaperone == [chaperone]