Tomasz Wegrzanowski

Posted on Jan 2, 2022

100 Languages Speedrun: Episode 43: Thue

#thue

Thue is a esoteric language based on string rewriting.

Program is a bunch of rewrite rules (string A to string B). Program state is a string. Thue picks a rule at random - if its left side is in the string somewhere, it will be replaced by the right side. And so on as long as it's possible. There are just a few special rules for input and output and that's all!

It's a very simple language, and absolute hell to code, so let's start!

I included a slightly fixed Thue interpreter in the repository.

Hello, World!

The simplest Hello World is this:

_::=~Hello, World!
::=
_

First we have list of rules. A rule has left side (match string), ::=, and right string (replacement). In this program there's just one rule, to replace _ by ~Hello, World!.

Empty rule means end of rules. After that there's initial program state _.

Normally rules specify replacement, but there are a few exceptions:

right side ::: means to take one line input from user, remove newline, and use that as replacement
right side starting with ~ means output whatever's the rest of the right side, without newline
right side that's just ~ means to print just newline

For some reason, some broken Thue implementations out there always print newline for every ~. That makes it impossible to print any line that doesn't have its own dedicated rule, which makes Thue basically useless.

This program doesn't print newline at the end of the hello string, so technically it's not a proper "Hello, World!" yet. We'll get to it.

Dice

This program rolls a random die:

_::=~1
_::=~2
_::=~3
_::=~4
_::=~5
_::=~6
::=
_

As there are 6 possible matches, it's up to Thue to decide which one will be chosen. I don't think anything about Thue guarantees that the choice will be "fair", but at least for such simple cases it's not a bad guess.

Many Dice - first attempt

Let's try something more complicated - roll multiple dice, separated by commas. An initial idea is to just do this:

roll::=~1
roll::=~2
roll::=~3
roll::=~4
roll::=~5
roll::=~6
,::=~,
$::=~
::=
roll,roll,roll$

Every roll gets replaced by a printed number, every , by a printed comma, and every $ by a printed newline. Sound fine, right?

$ thue ./many_dice.thue
5,63,

Right, Thue is nondeterministic, so not only we can't guarantee which rule will match, but which part of the string will. Commas and newlines can end up anywhere in the output.

Many Dice - fixed

^roll::=^ROLL
^,::=^COMMA
^$::=~
ROLL::=~1
ROLL::=~2
ROLL::=~3
ROLL::=~4
ROLL::=~5
ROLL::=~6
COMMA::=~,
::=
^roll,roll,roll$

Let's try something else - a command first gets activated when its time is right - and for our simple program that means when the only thing before it is a ^. Activated command gets executed.

$ thue ./many_dice2.thue
2,6,1

If we follow program state step by step, here's what program state is, and which rule gets matched:

^roll,roll,roll$ - matches ^roll
^ROLL,roll,roll$ - matches one of ^ROLL rules
^,roll,roll$ - matches ^,
^COMMAroll,roll$ - matches COMMA
^roll,roll$ - matches ^roll
^ROLL,roll$ - matches one of ^ROLL rules
^,roll$ - matches ^,
^COMMAroll$ - matches COMMA
^roll$ - matches ^roll
^ROLL$ - matches one of ^ROLL rules
^$ - matches ^$

I think it's a useful convention to put ^ at start of the program state and $ at the end. They don't mean anything special to Thue, but they look sort of like regexps to people. Just remember to not remove them except when the program is done.

Say Hello to a specific person

This is a 170 line program, because we need to duplicate certain rules for every single letter, lower case, and every single letter, upper case. I'll skip repetitive letters:

_::=:::
# let the < pass through to the left ::=
a<::=<a
b<::=<b
c<::=<c
...
Z<::=<Z
# if < reached ^ then we are ready to start printing ::=
# we need this step so it waits for user input before it starts to print ::=
^<::=^*!<
!<::=~Hello,
# activate print letter command ::=
^*a::=^*>a
^*b::=^*>b
^*c::=^*>c
...
^*Z::=^*>Z
# execute print letter ::=
>a::=~a
>b::=~b
>c::=~c
...
>Z::=~Z
# we're done, print exclamation mark and newline ::=
^*$::=&&12
&1::=~!
&2::=~
::=
^_<$

First thing to note are "comments" in Thue. Technically there aren't any, but any rule that will never match might as well be a comment, so feel free to use them - just always include ::= to make it parse correctly.

Let's confirm that it works:

$ thue hello2.thue
Alice
Hello, Alice!

I think instead of explaining all the rules in the abstract, it's better to show program steps as they happen:

^_<$ - _ rule matches, waiting for user input, user inputs Alice.
^Alice<$ - e< rule matches, transforming it to moves to <e
^Alic<e$ - c< rule matches, transforming it to moves to <c
^Ali<ce$ - i< rule matches, transforming it to moves to <i
^Al<ice$ - l< rule matches, transforming it to moves to <l
^A<lice$ - A< rule matches, transforming it to moves to <A
^<Alice$ - ^< rule matches, program knows user input is done, so we can start printing
^*!<Alice$ - !< rule matches, printing Hello, (there's a space at end of that line)
^*Alice$ - *A rule matches, activating the print letter command
^*>Alice$ - >A rule matches, printing the letter A - we need to do this in two steps, as print commands cannot have any replacement. If they could, we'd just have one rule *A - print A and replace with *
^*lice$ - *l rule matches, activating the print letter command
^*>lice$ - >l rule matches, printing the letter l
^*ice$ - *i rule matches, activating the print letter command
^*>ice$ - >i rule matches, printing the letter i
^*ce$ - *c rule matches, activating the print letter command
^*>ce$ - >c rule matches, printing the letter c
^*e$ - *e rule matches, activating the print letter command
^*>e$ - >e rule matches, printing the letter e
^*$ - we reached the end, so ^*$ rule matches, setting up the final print - as print newline command is separate we need to setup two commands
&&12 - &1 rule matches, printing ! - &2 rule is intentionally split apart, so these happen in the right order
&2 - and finally &2 rule matches, printing newline; at this point the program is empty and nothing else matches, so it exits

So on one hand it was all very elegant. On the other hand we needed 170 rules.

So it's really not surprising that someone created Rue - Thue with regular expression rules, and modules. Rue "modules" are simply file includes so you can put those long blocks of rules for print etc. in one file and reuse in multiple programs instead of copying and pasting. They don't provide anything that Cmd-C Cmd-V doesn't, just make it less tedious. Regular expressions on the other hand, while they can be used to just replace a bunch of such pointlessly repetitive rules with one, I feel they might also be way too powerful, and trivialize the challenge a bit. Or maybe it's all fine, I haven't tried to use Rue yet.

Add One

Nontrivial Thue programs tend to run out of characters, and start using difficult to read combinations. But why not use emoji instead? Here's a program to one to user's input numbers. It's also somewhat repetitive but as it's just 9 same cases (0-8, 9 is unique), not 52, I included the whole program. I haven't seen anyone else use emoji with Thue, I think it improves it.

You just need to be careful with emoji that consist of multiple Unicode codepoints, as you might get half of emoji matching. It's not generally a problem for anything matching left to right, but Thue by design can match anywhere at all.

# using some Unicode to make things more readable ::=
_::=:::
# rules for adding one to a digit ::=
# only 9 is special ::=
0🧲::=✅1
1🧲::=✅2
2🧲::=✅3
3🧲::=✅4
4🧲::=✅5
5🧲::=✅6
6🧲::=✅7
7🧲::=✅8
8🧲::=✅9
9🧲::=🧲0
# if extra carry arrived, add leading 1 ::=
^🧲::=^✅1
# we're done, so just let ✅ pass through ::=
0✅::=✅0
1✅::=✅1
2✅::=✅2
3✅::=✅3
4✅::=✅4
5✅::=✅5
6✅::=✅6
7✅::=✅7
8✅::=✅8
9✅::=✅9
# print the result ::=
# first activate each letter with 📄 ::=
^✅::=^🖨
🖨0::=🖨📄0
🖨1::=🖨📄1
🖨2::=🖨📄2
🖨3::=🖨📄3
🖨4::=🖨📄4
🖨5::=🖨📄5
🖨6::=🖨📄6
🖨7::=🖨📄7
🖨8::=🖨📄8
🖨9::=🖨📄9
# now print the activated letter ::=
📄0::=~0
📄1::=~1
📄2::=~2
📄3::=~3
📄4::=~4
📄5::=~5
📄6::=~6
📄7::=~7
📄8::=~8
📄9::=~9
# print newline once we're done with all the digits
^🖨$::=~
::=
^_🧲$

Let's give it a try:

$ ~/d/Thue/src/thue.rb addone.thue
0
1
$ ~/d/Thue/src/thue.rb addone.thue
68
69
$ ~/d/Thue/src/thue.rb addone.thue
419
420
$ ~/d/Thue/src/thue.rb addone.thue
999999
1000000

I hope included comments and emoji are clear enough, but just in case, here's the state step by step for input 29:

^_🧲$ - _ rule matches, it gets replaced by user input 29
^29🧲$ - 9🧲 rule matches, it gets replaced by 🧲0, as we need to increment previous digit
^2🧲0$ - 2🧲 rule matches, it gets replaced by ✅3
^✅30$ - ^✅ rule matches, we're done incrementing, and can start printing
^🖨30$ - 🖨3 rule matches, digit printing activates, replacing it with 🖨📄3
^🖨📄30$ - 📄3 rule matches, printing 3
^🖨0$ - 🖨0 rule matches, digit printing activates, replacing it with 🖨📄0
^🖨📄0$ - 📄0 rule matches, printing 0
^🖨$ - as there are no more digits, ^🖨$ rule matches and prints newline
state is empty and there are no further matches, so the program stops

Loop

As we know how to increment numbers and how to print them, it shouldn't be hard to create a loop, we just need to keep going after a number instead of finishing the program.

The other change compared with the previous program is that printing pass leaves the number behind on its left as it passes through, instead of deleting it.

# rules for adding one to a digit ::=
# only 9 is special ::=
0🧲::=✅1
1🧲::=✅2
2🧲::=✅3
3🧲::=✅4
4🧲::=✅5
5🧲::=✅6
6🧲::=✅7
7🧲::=✅8
8🧲::=✅9
9🧲::=🧲0
# if extra carry arrived, add leading 1 ::=
^🧲::=^✅1
# we're done, so just let ✅ pass through ::=
0✅::=✅0
1✅::=✅1
2✅::=✅2
3✅::=✅3
4✅::=✅4
5✅::=✅5
6✅::=✅6
7✅::=✅7
8✅::=✅8
9✅::=✅9
# print the result ::=
# first activate each letter with 📄, but leave a copy behind ::=
^✅::=^🖨
🖨0::=0🖨📄0
🖨1::=1🖨📄1
🖨2::=2🖨📄2
🖨3::=3🖨📄3
🖨4::=4🖨📄4
🖨5::=5🖨📄5
🖨6::=6🖨📄6
🖨7::=7🖨📄7
🖨8::=8🖨📄8
🖨9::=9🖨📄9
# now print the activated letter ::=
📄0::=~0
📄1::=~1
📄2::=~2
📄3::=~3
📄4::=~4
📄5::=~5
📄6::=~6
📄7::=~7
📄8::=~8
📄9::=~9
# once printing finished, switch to increment mode 🧲
🖨$::=🏁🧲$
🏁::=~
::=
^🖨1$

In prints numbers forever, so let's check representative part of execution:

^68🧲$ - printing 68 just finished, the program enters the increment mode, with 🧲 going left and incrementing everything as it goes, rule 8🧲 matches rewrites it to ✅9
^6✅9$ - rule 6✅ matches rewriting it to ✅6, ✅ will just keep going to the left until it hits the end
^✅69$ - the program reached the end so it switches to printing mode, rule ^✅ matches, it gets replaced by ^🖨
^🖨69$ - 🖨6 rule matches, it gets replaced by 6🖨📄6 - this is a non-deleting printer; as it 🖨 never looks to its left, all the printing works the same
^6🖨📄69$ - rule 📄6 matches, printing 6
^6🖨9$ -🖨6 rule matches, it gets replaced by 9🖨📄9
^69🖨📄9$ - rule 📄9 matches, printing 9
^69🖨$ - rule 🖨$ matches, program prepares to print newline and switch to increment mode
^69🏁🧲$ - rule 🏁 matches, printing newline
^69🧲$ - printing 69 just finished, the program enters the increment mode, with 🧲 going left and incrementing everything as it goes and so on

FizzBuzz

This will be infinite FizzBuzz. We're not going to do any modulo arithmetic, we'll just use three numbers, all being incremented by 1 every loop iteration.

Program state starts as ^0,0,0🧲$ and in increment mode, the 🧲 will go left incrementing all the numbers it passes through. Then in printing mode we need to do some trickery to reset the first two numbers to 0 when they hit their max (3 and 5 respectively), and choose the right printer.

# rules for adding one to a digit ::=
# only 9 is special ::=
0🧲::=✅1
1🧲::=✅2
2🧲::=✅3
3🧲::=✅4
4🧲::=✅5
5🧲::=✅6
6🧲::=✅7
7🧲::=✅8
8🧲::=✅9
9🧲::=🧲0
# if extra carry arrived, add leading 1 ::=
^🧲::=^✅1
# we're done, so just let ✅ pass through ::=
0✅::=✅0
1✅::=✅1
2✅::=✅2
3✅::=✅3
4✅::=✅4
5✅::=✅5
6✅::=✅6
7✅::=✅7
8✅::=✅8
9✅::=✅9
# print the result ::=
# first activate each letter with 📄, but leave a copy behind ::=
^✅::=^🖨
🖨0::=0🖨📄0
🖨1::=1🖨📄1
🖨2::=2🖨📄2
🖨3::=3🖨📄3
🖨4::=4🖨📄4
🖨5::=5🖨📄5
🖨6::=6🖨📄6
🖨7::=7🖨📄7
🖨8::=8🖨📄8
🖨9::=9🖨📄9
# now print the activated letter ::=
📄0::=~0
📄1::=~1
📄2::=~2
📄3::=~3
📄4::=~4
📄5::=~5
📄6::=~6
📄7::=~7
📄8::=~8
📄9::=~9
# once printing finished, switch to increment mode 🧲
🖨$::=🏁🧲$
🏁::=~
::=
^🖨1$

Let's follow its execution for 68 (number) and 69 (Fizz):

^1,2,67🧲$ - printing of 67 finished, time to increment mode
^1,2,6✅8$
^1,2,✅68$ - it passes through 67🧲 incrementing it to ✅68
^1,2🧲,68$ - it hit comma, so back to increment mode, we'll be incrementing the "mod 5" number
^1,✅3,68$ - it passes though 2🧲 incrementing it to ✅3
^1🧲,3,68$ - it hit comma, so back to increment mode, we'll be incrementing the "mod 3" number
^✅2,3,68$ - it passes though 1🧲 incrementing it to ✅2
^📆2,3,68$ - incrementing mode finished, time to start moving right, 📆 mode selects printing mode, matching 📆2,3, and since none of the counters are maxed out, just doing 2,3,🖨 so normal number print mode (we could also process these one at a time, but we'd need more intermediate emoji states)
^2,3,🖨68$
^2,3,6🖨📄68$
^2,3,6🖨8$
^2,3,68🖨📄8$
^2,3,68🖨$ - it went through the number printing it as usual
^2,3,68🏁🧲$ - just needs to print newline
^2,3,68🧲$ - and go back to incrementing mode
^2,3,6✅9$
^2,3,✅69$ - it passes through 68🧲 incrementing it to ✅69
^2,3🧲,69$ - it hit comma, so back to increment mode, we'll be incrementing the "mod 5" number
^2,✅4,69$ - it passes though 3🧲 incrementing it to ✅4
^2🧲,4,69$ - it hit comma, so back to increment mode, we'll be incrementing the "mod 3" number
^✅3,4,69$ - it passes though 2🧲 incrementing it to ✅3
^📆3,4,69$ - incrementing mode finished, time to start moving right, 📆 mode selects printing mode, matching 📆3,4, and since "mod 3" counter is maxed out, replacing it with 0,4,🍙📄F - counter resets from 3 to 0, and we're in 📄F mode
^0,4,🍙📄F69$ - 📄F matches, printing Fizz
^0,4,🍙69$ - 🍙 passes through the number until it hits the end
^0,4,6🍙9$
^0,4,69🍙$ - 🍙$ works same as 🖨$, it just needs to print newline and switch back to increment mode
^0,4,69🏁🧲$ - 🏁 prints newline
^0,4,69🧲$ - and we're back in increment mode

Should you try Thue?

I definitely recommend giving it a try. As far as I can tell I might be the first person to do the Emoji Thue variant, but all Thue interpreters out there should support emoji.

Thue is a really good esoteric language, but it suffers from a few issues:

a lot of repetitive rules, for example to support printing stuff we needed 3 rules for every letter - I think at least single character regular expressions like ([0-9])✅::=✅\1 would really clean up the rule mass and make it easier to try more challenging programs; but full regexps would add huge amount of extra power, so I don't know what's the best balance
having to use symbol clusters for any bigger program (Emoji Thue is one way around it)
the I/O rules especially around newlines are a bit awkward

Code

All code examples for the series will be in this repository.

Code for the Thue episode is available here.

DEV Community

100 Languages Speedrun: Episode 43: Thue

Hello, World!

Dice

Many Dice - first attempt

Many Dice - fixed

Say Hello to a specific person

Add One

Loop

FizzBuzz

Should you try Thue?

Code

Top comments (0)

Read next

New AI Pattern-Matching Method Simplifies and Improves Data Generation

AI Safety Breakthrough: 80% Smaller Models Match Full Performance in Harmful Content Detection

Small AI Models Outperform Giants in Grading Language Tasks, New Study Shows

New ML Compiler Uses Pattern Matching to Speed Up AI Code, Verified with Formal Proofs