When I started the series I said that some of the episodes will be about creating a new programming language, but so far it's just been RPN calculators.
So let's create a better Thue! I made an episode about Thue, so read that for some background.
Thue is a fun esoteric language, but suffers from two annoying issues:
- newline handling requires special weird rules
- a lot of nearly identical rules needed to be copied and pasted as there are no wildcard matches
I'll be using Crystal to code the interpreter.
Basic Interpreter
Before I start adding new features, let's write a base interpreter.
class ThueRule
getter :left
def initialize(@left : String, @right : String)
end
def apply(str, idx)
before = str[0, idx]
after = str[idx+@left.size .. -1]
if @right == "~"
print "\n"
replacement = ""
elsif @right[0] == '~'
print @right[1..-1]
replacement = ""
elsif @right == ":::"
replacement = STDIN.gets.not_nil!.chomp
else
replacement = @right
end
before + replacement + after
end
def to_s(io)
io << "Rule<#{@left}::=#{@right}>"
end
end
class ThueProgram
def initialize
@rules = [] of ThueRule
@initial = ""
@state = ""
end
def load(path)
lines = File.read_lines(path).map(&.chomp).zip(1..)
while lines.size > 0
line, line_no = lines.shift
# Ignoring invalid rules, they are sometimes used as comments
next unless line =~ /\A(.*)::=(.*)\z/
break if $1 == ""
@rules << ThueRule.new($1, $2)
end
@initial = lines.map(&.first).join("\n")
end
def random_match
match = nil
matches_count = 0
@rules.each do |rule|
idx = 0
loop do
match_idx = @state.index(rule.left, idx)
break unless match_idx
idx = match_idx + 1
matches_count += 1
next unless rand(matches_count) == 0
match = {rule: rule, idx: match_idx}
end
end
match
end
def run
@state = @initial
while match = random_match
rule = match[:rule]
idx = match[:idx]
if ENV["DEBUG"]?
STDERR.puts "Applying rule #{rule} at #{idx} to #{@state.inspect}"
end
@state = rule.apply(@state, idx)
end
end
end
unless ARGV[0]
STDERR.puts "Usage: #{$0} <file.thue>"
exit 1
end
prog = ThueProgram.new
prog.load(ARGV[0])
# Crystal doesn't handle SIGPIPE well and we want to support:
# crystal thue.cr examples/fizzbuzz.thue | head -n 100
begin
prog.run
rescue e : IO::Error
exit if e.os_error == Errno::EPIPE
raise e
end
Step by step:
-
ThueRule
represents a Thue rule likea::=b
-
ThueRule
knows how to apply itself to a string at certain position, so it handles input and output itself -
ThueProgram
is defined as a collection of rules and initial state -
ThueProgram#load
knows how to read itself from Thue file -
ThueProgram#random_match
picks a random rule at random index according to fair algorithm, without building the whole list in memory -
ThueProgram#run
runs the program, starting from initial state - if we run it without passing Thue file name, we print short message and exit
- if STDOUT gets closed, we catch an exception and check if it's the right error, then exit quietly. Crystal lacks any good way to quit on SIGPIPE the way Ruby's
trap("PIPE", "EXIT")
does it
We can confirm the program is working on a few test programs:
$ crystal thue.cr examples/fizzbuzz.thue | head -n 10
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
$ crystal thue.cr examples/many_dice2.thue
6,4,2
$ DEBUG=1 crystal thue.cr examples/hello.thue
Applying rule Rule<_::=~Hello, World!> at 0 to "_"
Hello, World!
Better newline handling
The first issue with Thue I want to address is newline handling. A printing rule cannot print newlines, except the ~
rule which prints just the newline. It makes Thue I/O unnecessarily annoying. The easy way to fix it is by supporting escape codes in Rules. \n
and \\
are all we need for now. But since we're fixing things we might just as well add one extra escape code \s
for space. This is mainly because sometimes Thue code needs space at end of the line, and a lot of editors strip that automatically.
This only required rewriting ThueRule
a little. I also added some more output triggered by ENV["DEBUG"]
.
class ThueRule
@left : String
@right : String
getter :left
def initialize(left, right)
@left = process_backslashes(left)
@right = process_backslashes(right)
@right = "~\n" if @right == "~"
end
def apply(str, idx)
before = str[0, idx]
after = str[idx+@left.size .. -1]
if @right[0] == '~'
print @right[1..-1]
replacement = ""
elsif @right == ":::"
replacement = STDIN.gets.not_nil!.chomp
else
replacement = @right
end
before + replacement + after
end
def to_s(io)
io << "Rule<#{@left.inspect}::=#{@right.inspect}>"
end
def process_backslashes(s)
s.gsub(/\\./) do |m|
if m[1] == 'n'
"\n"
else
m[1]
end
end
end
end
Let's see how nice this is now. We can start with this dice rolling program in improved Thue:
_::=~How many dice do you want to roll (0-9)?\n
<A::=:::
<B::=~Here are your rolls:\n
0::=B
1::=B@
2::=B@@
3::=B@@@
4::=B@@@@
5::=B@@@@@
6::=B@@@@@@
7::=B@@@@@@@
8::=B@@@@@@@@
9::=B@@@@@@@@@
^@::=^%
%::=~1\n
%::=~2\n
%::=~3\n
%::=~4\n
%::=~5\n
%::=~6\n
^$::=~Thank you for playing ThueDice!\n
::=
^<<_A$
$ crystal thue_nl.cr examples_nl/many_dice.thue
How many dice do you want to roll (0-9)?
4
Here are your rolls:
2
5
6
3
Thank you for playing ThueDice!
$ crystal thue_nl.cr examples_nl/many_dice.thue
How many dice do you want to roll (0-9)?
1
Here are your rolls:
4
Thank you for playing ThueDice!
$ crystal thue_nl.cr examples_nl/many_dice.thue
How many dice do you want to roll (0-9)?
9
Here are your rolls:
3
6
6
2
3
6
3
3
1
Thank you for playing ThueDice!
We can also see the new debug output:
$ DEBUG=1 crystal thue_nl.cr examples_nl/many_dice.thue
Rule<"_"::="~How many dice do you want to roll (0-9)?\n">
Rule<"<A"::=":::">
Rule<"<B"::="~Here are your rolls:\n">
Rule<"0"::="B">
Rule<"1"::="B@">
Rule<"2"::="B@@">
Rule<"3"::="B@@@">
Rule<"4"::="B@@@@">
Rule<"5"::="B@@@@@">
Rule<"6"::="B@@@@@@">
Rule<"7"::="B@@@@@@@">
Rule<"8"::="B@@@@@@@@">
Rule<"9"::="B@@@@@@@@@">
Rule<"^@"::="^%">
Rule<"%"::="~1\n">
Rule<"%"::="~2\n">
Rule<"%"::="~3\n">
Rule<"%"::="~4\n">
Rule<"%"::="~5\n">
Rule<"%"::="~6\n">
Rule<"^$"::="~Thank you for playing ThueDice!\n">
Applying rule Rule<"_"::="~How many dice do you want to roll (0-9)?\n"> at 3 to "^<<_A$"
How many dice do you want to roll (0-9)?
Applying rule Rule<"<A"::=":::"> at 2 to "^<<A$"
2
Applying rule Rule<"2"::="B@@"> at 2 to "^<2$"
Applying rule Rule<"<B"::="~Here are your rolls:\n"> at 1 to "^<B@@$"
Here are your rolls:
Applying rule Rule<"^@"::="^%"> at 0 to "^@@$"
Applying rule Rule<"%"::="~4\n"> at 1 to "^%@$"
4
Applying rule Rule<"^@"::="^%"> at 0 to "^@$"
Applying rule Rule<"%"::="~5\n"> at 1 to "^%$"
5
Applying rule Rule<"^$"::="~Thank you for playing ThueDice!\n"> at 0 to "^$"
Thank you for playing ThueDice!
No more matchesn. Final state: ""
Interestingly this kind of debug output is much more difficult in the original Thue, as we'd be printing debug information between line contents and its newline.
And of course we can now have correct Hello, World! that prints that final newline
_::=~Hello, World!\n
::=
_
$ crystal thue_nl.cr examples_nl/hello.thue
Hello, World!
We can also fix the program to say hello to someone, these are updated rules, the other 166 are the same:
!<::=~Hello,\s
^*$::=~!\n
$ crystal thue_nl.cr examples_nl/hello2.thue
Alexander
Hello, Alexander!
Designing a better Thue
Now we face much bigger issue. Thue programs need crazy amount of repetition as every letter we pass needs its own rule. So for example the program that asks for your name (letters only), has rules like these 52 times each, once for every lowercase and uppercase letter:
a<::=<a
^*a::=^*>a
>a::=~a
FizzBuzz program likewise had rules like these, multiplied for every digit:
0✅::=✅0
🍙0::=0🍙
🖨0::=0🖨📄0
📄0::=~0
There were also 10 rules for adding 1 to a digit, which aren't quite the same, but there's a lot of redundancy:
0🧲::=✅1
1🧲::=✅2
2🧲::=✅3
3🧲::=✅4
4🧲::=✅5
5🧲::=✅6
6🧲::=✅7
7🧲::=✅8
8🧲::=✅9
9🧲::=🧲0
We'd really like to deal with this copypasta. The most obvious idea is to allow character ranges like [0-9]
or [a-zA-Z]
. But then would we also have to do parentheses ()
and backreferences \1
so we get rules like these?
([a-zA-Z])<::=<\1
So I had a different idea. The only thing we really need to match is character classes, so why not do this:
[a-zA-Z]<::=<[a-zA-Z]
Doing it like that has nice additional benefit that we can do shifted character classes very neatly, like these rules for addition:
[0-8]🧲::=✅[1-9]
9🧲::=🧲0
We can also repeat the same class multiple times:
🖨[0-9]::=[0-9]🖨📄[0-9]
This supports any N-to-N, and N-to-1 mapping. If you have something more complicated, then you'll need to be creative and use multiple rules. I think this is great common ground, where you don't get any crazy additional powers, but it also saves you from pointless copy&paste programming.
And to make it really clear, all such rules are expanded into their regular form. So 🖨[0-9]::=[0-9]🖨📄[0-9]
actually becomes 10 regular Thue rules (well regular plus fixed newline handling).
An extra feature that I didn't really think about, is that you could have a range on the right side only, which in effect means random choice.
Some programs in Better Thue
Before I show the whole interpreter, let's just go through a few programs, as they're much more concise.
Here's the dice rolling program, it has 6 rules of what ROLL
can turn into:
^roll::=^ROLL
^,::=^COMMA
^$::=~
ROLL::=~[1-6]
COMMA::=~,
::=
^roll,roll,roll$
$ crystal thue_rx.cr examples_rx/many_dice2.thue
1,5,3
Loop uses triple range rule (🖨[0-9]::=[0-9]🖨📄[0-9]
) and different range rule ([0-8]🧲::=✅[1-9]
):
# rules for adding one to a digit ::=
# only 9 is special ::=
[0-8]🧲::=✅[1-9]
9🧲::=🧲0
# if extra carry arrived, add leading 1 ::=
^🧲::=^✅1
# we're done, so just let ✅ pass through ::=
[0-9]✅::=✅[0-9]
# print the result ::=
# first activate each letter with 📄, but leave a copy behind ::=
^✅::=^🖨
🖨[0-9]::=[0-9]🖨📄[0-9]
# now print the activated letter ::=
📄[0-9]::=~[0-9]
# once printing finished, switch to increment mode 🧲
🖨$::=🏁🧲$
🏁::=~
::=
^🖨1$
$ crystal thue_rx.cr examples_rx/addone.thue
419
420
Hello says hello to you. It's the program that benefited the most:
_::=:::
# let the < pass through to the left ::=
[a-zA-Z]<::=<[a-zA-Z]
# if < reached ^ then we are ready to start printing ::=
# we need this step so it waits for user input before it starts to print ::=
^<::=^*!<
!<::=~Hello,\s
# activate print letter command ::=
^*[a-zA-Z]::=^*>[a-zA-Z]
# execute print letter ::=
>[a-zA-Z]::=~[a-zA-Z]
# we're done, print exclamation mark and newline ::=
^*$::=~!\n
::=
^_<$
$ crystal thue_rx.cr examples_rx/hello2.thue
Bitcoinella
Hello, Bitcoinella!
Loop loops forever:
# rules for adding one to a digit ::=
# only 9 is special ::=
[0-8]🧲::=✅[1-9]
9🧲::=🧲0
# if extra carry arrived, add leading 1 ::=
^🧲::=^✅1
# we're done, so just let ✅ pass through ::=
[0-9]✅::=✅[0-9]
# print the result ::=
# first activate each letter with 📄, but leave a copy behind ::=
^✅::=^🖨
🖨[0-9]::=[0-9]🖨📄[0-9]
# now print the activated letter ::=
📄[0-9]::=~[0-9]
# once printing finished, switch to increment mode 🧲
🖨$::=🏁🧲$
🏁::=~
::=
^🖨1$
$ crystal thue_rx.cr examples_rx/loop.thue | head -n 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
And FizzBuzz obviously FizzBuzzes:
# rules for adding one to a digit ::=
# only 9 is special ::=
[0-8]🧲::=✅[1-9]
9🧲::=🧲0
# if number ended but extra carry arrived, add leading 1 ::=
^🧲::=^✅1
,🧲::=,✅1
# If we reach a comma that means another number starts ::=
# we also want to increment it so switch back to 🧲 mode ::=
,✅::=🧲,
# we're done, so just let ✅ pass through ::=
[0-9]✅::=✅[0-9]
# start the printer ::=
# it's simpler to look at both counter at once ::=
# we reset the counters, then start printer in proper mode ::=
^✅::=^📆
📆1,1,::=1,1,🖨
📆2,1,::=2,1,🖨
📆3,1,::=0,1,🍙📄F
📆1,2,::=1,2,🖨
📆2,2,::=2,2,🖨
📆3,2,::=0,2,🍙📄F
📆1,3,::=1,3,🖨
📆2,3,::=2,3,🖨
📆3,3,::=0,3,🍙📄F
📆1,4,::=1,4,🖨
📆2,4,::=2,4,🖨
📆3,4,::=0,4,🍙📄F
📆1,5,::=1,0,🍙📄B
📆2,5,::=2,0,🍙📄B
📆3,5,::=0,0,🍙📄X
# if we have 📄F, 📄B, or 📄X, we just print that ::=
# then we pass 🍙 to the right without printing number ::=
📄F::=~Fizz
📄B::=~Buzz
📄X::=~FizzBuzz
🍙[0-9]::=[0-9]🍙
# print the result if we're in number print mode ::=
# first activate each letter with 📄, but leave a copy behind ::=
🖨[0-9]::=[0-9]🖨📄[0-9]
# now print the activated letter ::=
📄[0-9]::=~[0-9]
# once printing finished, switch to increment mode 🧲
🖨$::=🏁🧲$
🍙$::=🏁🧲$
🏁::=~
::=
^0,0,0🧲$
$ crystal thue_rx.cr examples_rx/fizzbuzz.thue | head -n 20
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
For FizzBuzz it looks like the list of 15 remainders combinations could use some more complex regexps to be made more readable, but I guess that's a puzzle you'll need to solve yourself.
Final Source Code
Here's the final source code:
class ThueRule
getter :left
def initialize(@left : String, @right : String)
@right = "~\n" if @right == "~"
end
def apply(str, idx)
before = str[0, idx]
after = str[idx+@left.size .. -1]
if @right[0] == '~'
print @right[1..-1]
replacement = ""
elsif @right == ":::"
replacement = STDIN.gets.not_nil!.chomp
else
replacement = @right
end
before + replacement + after
end
def to_s(io)
io << "Rule<#{@left.inspect}::=#{@right.inspect}>"
end
end
class ThueSideParser
getter :results
@toparse : Array(Char)
def initialize(@str : String)
@results = [""]
@toparse = @str.chars
parse
end
private def parse
until @toparse.empty?
case @toparse[0]
when '['
chars = parse_range
if @results.size == 1
@results = chars.map{|c| @results[0]+c}
elsif @results.size == chars.size
@results = @results.zip(chars).map{|s,c| s+c}
else
raise "Sizes of character classes mismatch in #{@str}"
end
else
c = parse_character
@results = @results.map{|s| s + c}
end
end
@results
end
private def parse_character
if @toparse[0] == '\\'
@toparse.shift
raise "Unmatched \\ in #{@str}" if eos?
c = @toparse.shift
case c
when 'n'
'\n'
when 's'
' '
else
c
end
else
@toparse.shift
end
end
private def parse_range
chars = [] of Char
@toparse.shift
loop do
raise "Character range never closed in #{@str}" if eos?
if @toparse[0] == ']'
@toparse.shift
return chars
end
c = parse_character
raise "Character range never closed in #{@str}" if eos?
if @toparse[0] == '-'
@toparse.shift
e = parse_character
raise "Invalid character range in #{@str}" if e < c
chars.concat(c..e)
else
chars << c
end
end
end
private def eos?
@toparse.empty?
end
end
class ThueRuleParser
def initialize(@str : String)
if @str =~ /\A(.*)::=(.*)\z/
@valid = true
@left = $1
@right = $2
else
@left = ""
@right = ""
@valid = false
end
end
def valid_rule?
@valid
end
def empty_rule?
@valid && @left.empty?
end
def call
lefts = ThueSideParser.new(@left).results
rights = ThueSideParser.new(@right).results
# Support N-to-1 and 1-to-N rules
lefts *= rights.size if lefts.size == 1
rights *= lefts.size if rights.size == 1
unless lefts.size == rights.size
raise "Mismatched side of rule #{@str}"
end
lefts.zip(rights).map do |left, right|
ThueRule.new(left, right)
end
end
end
class ThueProgram
def initialize
@rules = [] of ThueRule
@initial = ""
@state = ""
end
def load(path)
lines = File.read_lines(path).map(&.chomp).zip(1..)
while lines.size > 0
line, line_no = lines.shift
# Ignoring invalid rules, they are sometimes used as comments
parser = ThueRuleParser.new(line)
next unless parser.valid_rule?
break if parser.empty_rule?
@rules.concat parser.call
end
@initial = lines.map(&.first).join("\n")
end
def random_match
match = nil
matches_count = 0
@rules.each do |rule|
idx = 0
loop do
match_idx = @state.index(rule.left, idx)
break unless match_idx
idx = match_idx + 1
matches_count += 1
next unless rand(matches_count) == 0
match = {rule: rule, idx: match_idx}
end
end
match
end
def run(debug)
@state = @initial
if debug
@rules.each do |rule|
STDERR.puts rule
end
end
while match = random_match
rule = match[:rule]
idx = match[:idx]
if debug
STDERR.puts "Applying rule #{rule} at #{idx} to #{@state.inspect}"
end
@state = rule.apply(@state, idx)
end
if debug
STDERR.puts "No more matches. Final state: #{@state.inspect}"
end
end
end
unless ARGV[0]
STDERR.puts "Usage: #{$0} <file.thue>"
exit 1
end
prog = ThueProgram.new
prog.load(ARGV[0])
# Crystal doesn't handle SIGPIPE well and we want to support:
# crystal thue.cr examples/fizzbuzz.thue | head -n 100
begin
prog.run(!!ENV["DEBUG"]?)
rescue e : IO::Error
exit if e.os_error == Errno::EPIPE
raise e
end
Next steps
The biggest issue with this interpreter is that every step checks every rule, so it's O(number of rules) * O(state size).
It would be fairly straightforward to compile list of states into a finite automaton without backtracking, even if specifics (uniformly random match) are a bit different from the usual matching algorithm, so it's technically not a DFA or an NFA.
Overall I think this language is a much better Thue than the original Thue. Writing actually programs would be just as much of a challenge, but amount of copying and pasting is dramatically lower.
Another Thue with regexps language is Rue, but it supports unlimited regexps, and that makes some coding challenges pretty much trivial.
As for coding it all in Crystal, it was generally a positive experience. The code contains very few pointless type annotations, and I didn't need to torture it into a different shape than I wanted just to make the type checker happy. I mostly chose it over Ruby, as I wanted to do the very fast finite automaton, but in the end I didn't quite get there.
Code
All code examples for the series will be in this repository.
Code for the Better Thue Interpreter in Crystal episode is available here.
Top comments (0)