DEV Community

naens
naens

Posted on

The PowerScheme Interpreter

alt text

Table of Contents

  1. Introduction
    1. Hello
    2. Lisp in Small Pieces
    3. Repetition
    4. Different Languages
    5. About PowerShell
    6. Other Scheme Projects
    7. About Lisp
  2. Lisp and Scheme
    1. Intro
    2. Short History of Lisp
    3. Short History of Scheme
    4. Scheme as a Language to write Interpreters for
    5. Scheme Properties and Features
      1. Interpreted / Compiled
      2. Syntax: S-Expressions, Case-sensitiveness, Parentheses
      3. Types: data and functions
      4. Scope: lexical scope
      5. Evaluation: eager evaluation
      6. Garbage collection
    6. More about Scheme
  3. About PowerShell
    1. Introduction
    2. History of PowerShell
      1. Before PowerShell: Windows
      2. Before PowerShell: UNIX
      3. The Purpose of PowerShell
      4. The Timeline of PowerShell
    3. Characteristics of PowerShell
      1. Influences of other shells and programming languages
      2. Types
      3. Commands, Functions and Methods
      4. Scope: dynamic scope
    4. Criticism
    5. Impressions
  4. Features
    1. Case-Insensitivity of Symbols and Variable names
    2. Homoiconicity
    3. Type System
    4. Lexical Scope and Dynamic Scope
      1. Lexical Scope
      2. Dynamic Scope
      3. Conclusion about Lexical and Dynamic Scope
    5. First-class Functions and Closures
      1. What is being First-class Citizen?
      2. Higher-order Functions
      3. Anonymous Functions
      4. Closures
    6. Tail-call Optimization
  5. Functionality
    1. PowerScheme is a Scheme interpreter
      1. Interactive mode
      2. Non-interactive mode
    2. Lexical and Dynamic Scope in PowerScheme
    3. Exception Handling
  6. Implementation Details
    1. The Structure of the Code
    2. The Entry Point
    3. The Tokenizer
    4. The Bag
    5. The Parser
    6. The Evaluator
      1. Types
      2. Functions
      3. Local Functions and Thunks
    7. The Environment
      1. Lexical and Dynamic Environments
      2. Differences between Lexical and Dynamic Scope
      3. Tail Call Optimization
      4. Environment Representation
      5. Branching Stacks
      6. Define-Defines
      7. Global and Local Arrays
      8. Declare and Update for Lexical and Dynamic variables
    8. Conclusion about the implementation details
  7. What Else PowerScheme can do
    1. A REPL
    2. Y-Combinator
    3. Function Objects
  8. What is missing?
    1. Vectors
    2. Let*
    3. Caar and so on
    4. Set-car! and Set-cdr!
    5. Continuations
    6. Exceptions
    7. Macros
  9. Conclusion
  10. Bibliography

Introduction

Hello

Hello. In this post I would like to describe the Scheme interpreter I
wrote using the PowerShell programming language. I am still a
beginner when it comes to Lisp and to programming in general, so I
decided to write a simple interpreter.

This program is open source and everyone is free to do anything with
it. So here is the link:
https://github.com/naens/scheme-lisp/tree/master/chap02pwsh. The
license it GNU GPL v3.0. Of course nobody will want to buy it or use
it for critical applications, so it's here just for fun.

Lisp in Small Pieces

I am following the book Lisp in Small Pieces queinnec_lisp (often abbreviated as
LiSP), written by Christian Queinnec. It shows how Scheme and Lisp in
general work and how to write different kinds of Scheme interpreters
and compilers.

Repetition

In the book, the author says that "Good teaching demands a certain
amount of repetition", on page xiv of my edition of the book. So in
order to avoid cheating, I have to implement Scheme as many times as
the author does.

Different Languages

Because I would like as much as possible to avoid cheating, simply
rewrite the interpreter several times is obviously not enough. With
the time it will seem that it's all the same and I will remember the
variables, the files, how things are implemented and so on. In order
to avoid this problem I decided to make it so that it would seem that
I experience the implementation of the interpreter or the compiler for
the first time. Is there a way to do this? Of course. The solution
is to implement each interpreter and compiler in different language.
That's why the interpreter I wrote for the first chapter was in
Kotlin. I had never ever written anything in Kotlin, let alone a
Scheme interpreter! At the same time it was a good opportunity to
learn a new programming language and a new development environment (I
used Intellij IDEA, which was not totally new for me, because I had
used it with Java previously, but with Kotlin it was definitely new).

I have just finished the second chapter of the book, which was about
different environments. It talked mainly about the function
environment and the dynamic environment.

It's interesting how much more complicated everything gets simply by
adding a separate namespace for the functions, and the only advantage
is that it increases the performance.

About PowerShell

So, this second chapter I decided to do in PowerShell. It's
interesting that it tries to fill the gap between shells and
programming languages. The result, I would say, is that it's closer
to the latter. It also attempts to solve problems that were there in
the UNIX shells, like the string literals difficulties, the pipe,
which is text only and makes programs interdependent. Also, it allows
to use arithmetic expressions, which was a problem in many earlier
shells.

Because PowerShell became available on Linux and now is stable, I
decided it was a good opportunity to learn something that is
completely new for me and there is nothing similar in other mainstream
languages.

In the end it seems PowerShell is a good programming language, and
has a lot of features that make the writing of a Scheme interpreter
in it really easy. It has all the libraries and data structures
needed, it doesn't except the programmer to free all memory used,
and, in my opinion, it's actually fast. I didn't compare it with
other programming languages, but for an interpreter that interprets
an interpreter the speed is amazing.

Other Scheme Projects

After I finish this project, that I decided to call PowerScheme for
obvious reasons, I will continue to write interpreters in other
languages and I know that it's perfectly possible to write them all
without writing twice in the same language. I think I'll try to avoid
some languages that need a lot of work in order to be mastered. I
think about languages like APL, Malbolge, Haskell, and similar. But
it's almost sure that I will use Prolog, Assembly and PL/M at some
point.

About Lisp

So this program is an implementation of Scheme, a programming language
of the Lisp family. The first implementations of Lisp were
interpreters and later also compilers were written. And Lisp has a
very rich history mainly because the whole time it was a language in
which people can experiment. It is said that the language is
homoiconic, which means that the internal representation of data
matches its lexical representation. This makes it easy to manipulate
the language as data. For this reason a lot of things, like code
generators and macros can be easily written using Lisp.

And because of this richness, only a small part of everything that
ever existed in the history of Lisp has to be implemented. On the
other hand you can decide to implement anything you want. For
example, if you find some obscure old feature that existed somewhere
in a programming language (it doesn't have to be Lisp), you can
implement it.

In my case such feature were dynamic variables. Following the book I
implemented both lexical and dynamic environments, but I decided to
make it not exactly like in the book. So the user has the choice to
declare variables to be lexical or dynamic. My implementation, in my
opinion, is closer to Common Lisp, but I will tell more about it later
in the section on the functionality.

There are other features that I did not implement. For example,
vectors, they are not really needed in Lisp. They are needed only to
make certain operations faster, but everything they do can be done
using lists.

Another thing that I didn't implement are continuations. I thought it
was important to implement, but then, when I implemented the whole
language and thought about how I would add continuations, I understood
that a lot will have to be rewritten. So that means, first, no
continuations this time, second, if you want continuations in your
language, you should think ahead how to implement them, better before
even starting writing code. There will be a small section about
continuations, but I will describe them in more detail when I will
write about an interpreter where they will actually be implemented.

By the way, I didn't make a post similar to this about the first
interpreter that I have written using Kotlin, because it's really
simple and basic. And also, I didn't have the idea yet to write a
post about each interpreter I write for this book. And going back in
time and try to write a text as though I have just written it doesn't
seem right either.

Lisp and Scheme

Intro

The language that this interpreter, PowerScheme, interprets is
Scheme. Scheme is a programming language that is born from Lisp. It
can be said that Scheme is a dialect of Lisp as it continues the
tradition of Lisp and has a lot of common traits with other languages
of the Lisp family, but at the same time, Scheme is a little bit apart
from other Lisps. The creators of Scheme were not really preoccupied
whether existing Lisp programs can be adapted to Scheme or not. They
also decided to drop a lot of features that were at some point
introduced, but were not really useful in practice. Actually, Scheme
is often considered to be a very compact language with short
specification. So instead of making something complex, but useful for
practical tasks, they made a language that looks better from the
theoretical point of view and is better suited to explore theoretical
concepts.

Scheme is considered to be a functional programming language, although
the notion of the functional programming language has evolved with the
time. At first it simply meant that the language can manipulate
functions. It seems this notion acquired the modern meaning when
Scheme was released in 1975, so Scheme is one of the first functional
programming languages in the modern meaning of the word. Today being
functional means to focus on immutability, avoid side effects, jumps,
and loops.

Short History of Lisp

So, Scheme is a descendant of the Lisp family of programming
languages. It inherited several of its features. For example, the
syntax, the names and the meanings of many of its keywords come from
Lisp.

Lisp is one of the first programming languages that were created. At
least it is the first successful programming language that was not an
abstraction of the existing hardware, as it was based on theory, such
as logic and lambda calculus.

Lisp appeared in 1958. There is only one language from this time that
still survives, FORTRAN. Because of its origins in logic, Lisp helped
to define several concepts of programming that were first introduced
in Lisp. These include, for example, closures, macros and anonymous
function, and also manipulation of functions as data in general.

Over time, many of the ideas that were already in Lisp were
incorporated into other families of languages. It's interesting that
many languages are still based on the ideas that were already there in
the 1960's. For example, there was a comparison of the Go programming
language with Algol 1968 given_re:_2009, and it seems Go doesn't
bring anything new. So, it can be said that the evolution of the
programming languages since the beginning of the 1970's is just
re-combinations of the ideas that existed before that, and where the
more and more ideas from Lisp were incorporated.

Short History of Scheme

Scheme appeared in 1975 as an attempt to combine everything good that
existed in Lisp with the lexical scope that originated in Algol, by
eliminating, at the same time, everything that was not needed.

Scheme had a lot of success because it could be used to teach
programming concepts. For many years in many universities it was used
as the language of the introduction to programming. It is
because of its advantages as a language where the machine is much less
visible than in other languages (after all many languages of the Algol
family, like C or Pascal, are just abstraction of the machine
language). So it helped to focus on algorithms and the theoretical
concepts without the need to explain the detail of how pointers or
two's complement integers work.

Today there exist several implementations of Scheme. The most
important are Guile, which is used for the GNU project, Racket, one of
the most popular implementations of Scheme, and Chicken.

Scheme is has today a standard, which is defined in the so called
Revisedn Reports on the Algorithmic Language Scheme (RnRS), so the
implementations that follow it are mostly compatible with each other.

Scheme as a Language to write Interpreters for

Writing a Scheme interpreter is a popular exercise. There exist many
implementations in many languages. This tradition began in the first
years of Lisp, when the author the first specification of Lisp, John
McCarthy came to the idea that it would be not difficult to write an
interpreter of Lisp in Lisp graham_roots_2002. Later Lisp became
more and more complex and feature-rich, so what could be easily
re-implemented became less and less similar to the actual Lisp.

On of the goals of Scheme was to return to this simplicity, so that's
the reason why Scheme became a popular programming language to write
an interpreter for.

Scheme Properties and Features

Interpreted / Compiled

When Lisp first appeared, it was at first an interpreted language.
But then people began to write compilers for Lisp. Because Lisp can
generate new code during execution and then execute it, the code can
also be interpreted during the execution. So some parts of programs
are compiled, other parts are interpreted.

Scheme also is more often compiled than purely interpreted. Very
often Lisps are not compiled directly to the machine language, but to
a specific byte-code. Racket and Guile use byte-codes, and Chicken
compiles to C, so it can generate binary machine code executables.

Syntax: S-Expressions, Case-sensitiveness, Parentheses

The syntax of Scheme is based on the s-expressions, like most other
Lisps. It continues the tradition of the car and cdr accessor for
a cons and has many common keywords with other Lisps, such as
lambda, let, and cond.

Scheme can be case-sensitive or not. I prefer it to make
case-insensitive because I used a lot Common Lisp and I think IF and
LAMBDA should not be interpreted differently from if and lambda.
Scheme also was at first case-insensitive. But later a standard
(R6RS, published in 2007) was published where it was specified that it
is to be case-sensitive. In many implementation it is an option that
can be enabled or disabled.

Types: data and functions

Scheme is a dynamically strongly typed language, which means, that
even though the types are not specified for everything statically,
they are there during the execution and a value cannot be suddenly
interpreted as something else during execution (as opposed to C, where
it's possible and totally normal to read a string as an integer, read
a floating point number as a pointer, and similar).

An interesting feature of Scheme (and also other Lisps), is that it
can instantiate functions with their execution environment. So the
function value evaluated two times can give two different functions
depending on the environment of evaluation. That makes it possible to
let functions hold data and act as objects.

Scope: lexical scope

Scheme, like most programming languages today, uses lexical scope.
For a Lisp, code usually is more readable when it's a lexical scope
Lisp, because it can be easy to see what instance of a variable refers
to what. The reason why it is this way for Lisp is that traditionally
functional languages are often read declaratively. So it should be
possible to say what the program does without simulating an execution.
This is the principle of declarative programming. Dynamic scope goes
against it by its own definition: it's the variable, that during this
execution happens to be bound to the symbol. So, for a Lisp that
positions itself closer to the functional style of programming,
lexical scope is much more natural.

Evaluation: eager evaluation

Scheme uses eager evaluation. That means that before a function is
called, all of its arguments have to be evaluated. In some programming
languages it is not the case, then it's called lazy evaluation.
Normally, for purely functional languages, lazy evaluation and eager
evaluation should return the same value, except for the cases when the
evaluation is impossible. In these cases lazy evaluation has a higher
chance to return a value, because some things are not evaluated if
they don't have to. But as lazy evaluation is more difficult to
implement and is generally slower, it is not often used in programming
languages.

Like many other languages, Scheme has some constructs that behave in a
way similar to lazy evaluation. These are special forms if, and,
or, case, and cond. They can stop the evaluation under
specific condition and return a value before evaluating all the
arguments.

Garbage collection

Like most of the languages of the Lisp family, Scheme typically uses a
garbage collector. The reason is that it creates a lot of cons
structures and leaves them in memory even if they are not used
anymore. In order to not run out of available memory and not used
much more than needed, Scheme uses a garbage collector and returns
these unused structure to the memory available for later use.

More about Scheme

In the end, after writing this interpreter I see why a lot of people
prefer Scheme over Common Lisp. Common Lisp, being a Lisp2 is a
problem. It has some ugly characteristics that come from the fact
that the first position of a list during the evaluation has a
different evaluator, the function evaluator. So it has to have
function wrapper in order to pass use them as data. It's much less
clean. I think it's unfortunate that they decided to make it a
standard instead of making it like Scheme, which is much cleaner. But
Scheme is still strong. There are still a lot of people every year
learning it and writing interpreters for it, even though it used less
than before in education.

About PowerShell

Introduction

As programming language, for this project I decided to use PowerShell.
It is a modern shell written for Windows, which has some interesting
features. Many people say that it's a nice programming language, so I
decided to use it once in order to find out more about it.

Now it works on Windows and Linux, which is my main Operating System,
and is easy enough to install. When the source code was released it
was not very stable yet (I tried to install it on Arch Linux, but it
didn't work). Now the installation is easy and it works without
needing to install additional packages, plugins, libraries and
so on.

The main domain of use of PowerShell is the administration on the
Windows machines, and several of its qualities reflect that. For
example, the syntax is similar to both UNIX shell and to C#, in order
to be familiar to both UNIX and .NET developers. It can interact with
the components and the API's of the Operating System, which makes it a
useful tool.

History of PowerShell

Before PowerShell: Windows

The need for something like PowerShell was there from the early years
of Windows, when it became sophisticated enough for daily use, so many
administrative tasks needed to be automated.

Before PowerShell another command line shell existed, cmd.exe. But it
had its own problems. First of all, it was not written with Windows
in mind, because it's nothing more that the adaptation of the
COMMAND.COM shell that existed on MS-DOS. So, of course it could
accomplish certain things, but its features remained on the MS-DOS
level and nothing had been done to make it especially useful with
Windows.

So, here are some of the examples of problems of cmd.exe. First of
all, when its predecessor, COMMAND.COM appeared, it was already
outdated. It used GOTO for everything and didn't have functions.
cmd.exe tried to improve the situation, but it still remained limited.

An explanation of why cmd.exe is so uncomfortable for programming
and is the way it is, is that it was at first primarily meant to
execute .BAT files, which were supposed to be lists of commands, and
not real programs. Then some additional features needed to be
introduced, while maintaining the backward compatibility. Which made
it so, that with time it became impossible to improve it and it
remained the way it was, almost without changes, from the MS-DOS era.

Before PowerShell: UNIX

In the UNIX universe, the situation was somewhat better. Since the
introduction of the Bourne shell in 1977 UNIX had a powerful scripting
environment, which had loops, conditional expressions, functions, and
variables.

By combining the functionality of the shell with the multitasking
capabilities of the Operating System, and, by using external programs,
it was possible to write easily enough relatively complex programs.

By the time Windows appeared the UNIX shell had evolved and became
easier to use and had more features. The Korn Shell appeared, which
was a big improvement in usability over the original Bourne Shell.

Unfortunately it had its own problems, even on its native Operating
System. It could only manipulate text, and the common way to do
things was to extract a specific sub-string by filtering from the
standard output of an external program. It made both the scripts and
the programs interdependent. Programs could not easily be modified
without breaking the scripts. So, a program could not be improved, a
field could not be easily added or removed from the output. I think
this is what led to the POSIX specification, so that script writers
can be sure that the output of a certain program will not change in
the future.

On the Windows platform the problems were even bigger. It was not a
usual thing to have GUI program for administration tasks
jones_learn_2017. It was usually done by using libraries and
API's. There were attempts to use UNIX shells. For example, it is
known that there was a Korn Shell port, but in the end this is not
what could solve the problems.

The Purpose of PowerShell

So it has been decided that Windows needs a new scripting environment,
which is different both from cmd.exe and from UNIX shells.

Having learns from the experience of different other shells the goal
for the new shell was set to give Windows some real scripting
capabilities. The new scripting language was thus supposed to act as
an interface for the Windows functions and data types of the libraries
and the API.

The way PowerShell has been made reflects its purpose: it can access
the Operating System components, it supports objects, which also makes
it easier to work with the OS, since a .NET uses C#, an Object Oriented
language.

Another thing it tried to improve is to try to move away from being
text-based. Of course, being a shell, a lot of things are still text,
like, for example, command line arguments, standard input, and
standard output. But, as much as possible it tries to do with
objects. One interesting example is the pipeline. By passing objects
from one program to another through the pipe, the need to parse the
input from the pipeline is avoided.

The Timeline of PowerShell

The History of Powershell began in 2002, with the so-called Monad
Manifesto the_devops_collective_monad_2016, a text describing the need of a new shell for the Windows
Operating System, what its features should be, and what mistakes
should be avoided.

The first release, PowerShell 1.0, was for Windows Vista. Then it was
still optional. It became a part of the OS in 2008, by included in
Windows 7.

In 2016 it was Open-Sourced, and in 2018 it became possible to use
PowerShell on Linux. It is still not clear why they made it Open
Source, possibly because they wanted to attract new users or to show
that their software is written in such a way that it can be easily
ported, or perhaps in case they want to switch Windows to a Linux
kernel, they can be sure that enough people tried it and there will be
no big problems with it on a different kernel.

For me it's a great opportunity to try something I would not have
tried otherwise, because I work on Linux, and it's not worth it
installing Windows on VM or on a real machine only in order to try a
programming language. I like when programming language designers try
new or different approaches, so I wanted to try it from the moment I
heard it would work on Linux.

Characteristics of PowerShell

Influences of other shells and programming languages

PowerShell was influenced by several programming languages that came
before it. The first is UNIX shells, namely Bourne shell and Korn
shell.

It borrows from them the syntax for the variables, which have to be
prefix with a $ sign. To make it more consistent, it must be
prefixed every time a variable is used, as opposed to zsh, for
example, which on the contrary, tries to let the user avoid using $
whenever possible.

Another influence was Perl. Perl has been conceived as a programming
language which was based on the syntax of the Unix shell, but its goal
was to be usable as a real programming language, and so, it was made
easier to use. Its main features were regular expression matching and
substituting. It can, for example, use a boolean match test in an
if, which very valuable. This is something PowerShell can also do,
while still remaining in the domain of shells.

The third influence was C#. The authors knew that a lot of Windows
users work with C#, and so they borrowed several syntactic elements from C#
(for example, classes and arrays).

Types

Regarding the types, I think PowerShell is more on the strict side,
compared to other shells. Which is not a bad thing. The main rule is
that when conversion is made, no data should be lost (except when
floating point is involved).

This makes programming easier. When the data type is checked, and
something is of a wrong type, no conversion will be forced, and an
error is reported. Another advantage is that it becomes easier to
have an array that contains objects or arrays. Or an object that
contains an array. In some languages, like in UNIX shell, it becomes
really complicated if you want to do something similar.

Another interesting feature regarding types, is the support of
objects. Even though PowerShell is not fully a object oriented
language (authors call it an object based language), to have objects
really helps to make the programs more structured. They are like
structs, which can have some functions, called methods attached to
them.

The support of objects is an important feature of PowerShell. First,
it allows to interact with the Operating System. A good part of
Windows has an object oriented interface. So in order to interact
with it, having objects means that the translation between PowerShell
and the Operating System is easy. Second, combining objects with the
use of the pipeline makes it easy to pass data from one program to the
other using the pipeline.

Commands, Functions and Methods

PowerShell has two kind of things that can be "called" or "invoked":
commands and methods. Commands are standalone, and methods are a part
of a class and are invoked on an object.

There are four kinds of commands. The first is called cmdlet, which
is pronounced "command-let". It is a function that is compiled into a
.NET class and made built-in. It is the fastest among the commands
because it's compiled. I think it's an interesting feature. Very few
shells allow to expand the set of built-in commands.

It has a special name convention, <Verb>-<Name>, that is a verb
followed by a name, both starting with a capital letter and separated
by a dash.

The second kind of commands are functions. They are user-defined
functions thar appear in PowerShell scripts.

The third kind of commands are the script commands. It's when a whole
script is treated as a function. It's similar to bash: you can call a
file with arguments and the file will receive the arguments given by
the user. It's as if the whole script were one big function.

The fourth kind of commands are native commands. These are files
completely unrelated to PowerShell which happen to be executable on
the Operating System, so it's possible to make the PowerShell ask the
Operating System to execute them. It's also a concept that UNIX and
Linux user know: it corresponds to calling an external binary, like
cat or mknod.

Scope: dynamic scope

PowerShell from the point of view of the scope continues the tradition
of the dynamic scope shells. The authors say that they considered
different strategies, but chose to make it dynamically scoped anyway.
There should be a reason why so often languages that deal with
something related to systems end up being dynamically scoped.

Here are some other examples of dynamically scoped languages:

  1. UNIX shells.

    UNIX shells (sh, bash, ksh, zsh) are dynamically scoped to make things
    consistent between calling external programs and functions. When an
    external process is created, it receives a copy (or a COW) of the
    environment. When it finishes, what it did has no influence on the
    process that called it. This is the principle of the dynamic scope.

  2. Emacs Lisp.

    It is not because the authors were lazy or didn't know how to
    implement lexical scope, that Emacs Lisp is dynamically scoped. It is
    decision that makes sense in the context of system-oriented
    programming, and as Emacs is a little bit like an emulator of an
    Operating System, it makes sense to make it dynamically scoped. The
    advantage of dynamic scoping is that it is possible to set the
    environment before calling functions, and disable the changes after
    exiting from the current function.

  3. Common Lisp.

    Common Lisp allows to create lexically scoped variables and
    dynamically scoped variables. And it's a very interesting example,
    because one of the authors of the Common Lisp specification was the
    person that made the first fully lexically scoped Lisp, Scheme. That
    means that it has been acknowledged, that lexical scope is not always
    the best, nor is a solution for all. Besides that, implementing
    dynamic scoping in a language that is compiled is not easy, because
    variables will have to be looked up by name. In dynamic scope, we
    don't really know what is behind a variable name, it is only known
    during execution. That's the reason why it's called dynamic scope.

    I decided to give these examples here, because all of them, in my
    opinion, can be applied in PowerShell. It also allows to call
    external programs, and is used for the Operating System. It seems
    that it is the right thing to do in, order to avoid to pass too many
    parameters to functions.

Criticism

There are some things about PowerShell that not everybody likes. As
it is often the case, programming language designers have to make
choices, like for example, it's impossible to represent strings and
variables the same way. Some languages, like C or Lisp prefer to
represent the variables without special indications and the strings
within quotes. Other languages, primarily scripting languages make
the other way around, so they have to mark differently variables, for
example, by prefixing them with a $.

One of the bizarre things for someone who sees PowerShell for the
first time, is that the operators for number comparison and logical
operations are not symbols, but words prefixed by a -, like -gt,
-and, and so on. The reason for this is that the normal signs
conflicted with other operations like the > which represents the
output redirection, and PowerShell authors wanted to keep them.

There is an inconsistency in the syntax of the calls of commands and
methods. Commands are called without parentheses with arguments
separated by spaces. Methods, on the other hand, must have
parentheses and have arguments inside separated by commas. For some
reason the authors could not preserve the consistency for all cases.

Impressions

Overall, I think it's an OK environment for simple tasks, exactly how
it should be for Operating System administration. I would say it's
even too powerful. Do administrators really need to create a complex
data structure with modules classes, and so on? But for programming
it's OK to test simple algorithms, because the typing is strong
enough, so it's easy to make hierarchical structures and not worry
that objects will be lost or merged somehow.

Features

Case-Insensitivity of Symbols and Variable names

My implementation of Scheme is case-insensitive. I know that it's not
what is common in the modern programming languages, and even Scheme
switched to be case-sensitive. But I decided to make it
case-insensitive anyway.

When you write in Common Lisp, the interpreter, when it displays the
variables and functions in the REPL, it displays them in uppercase,
even though in the code it's lowercase. It's the tradition, and it
remained this way. That's why very often you see keywords like LET,
IF, LAMBDA, COND in uppercase when you're familiar with Common
Lisp.

When I switched to Racket, which is an implementation of Scheme, it
was not really a problem, because I always typed these keywords in
lowercase anyway.

But what I don't really want is for words like LAMBDA and IF to be
recognized as something different from lambda and if. The
programmers who used Common Lisp, at least CLISP and SBCL have been
trained to think of them as the same thing. That's why it should
remain this way. It's a habit.

I think it's not natural to change the meaning of the words based on
the case. Why would anyone give the same name to a variable with the
only difference being the case? If you want them to be different give
different names. Nowadays it's been recognized that having longer
names makes programs easier to read. By looking at the verbal
representation of two variables you have to be able to tell the
difference between them. If the only difference is the case, then you
have to remember why it's different. It's either the convention (like
the constants must be uppercase), or just random (why not giving this
variable an uppercase name?). In either way it's avoidable with a
better naming convention. That's why I see the case-sensitivity as a
step backwards.

Perhaps they wanted to support different languages? That's also a bad
idea. Characters may look the same, but at the same time, they might
have different Unicode code points. That's not a good idea. Anyway,
it's hard to find a justification for case-sensitive symbols and
variable names in Scheme. That's why my implementation is
case-insensitive. And it was the case for earlier versions of Scheme.

Homoiconicity

This implementation of Scheme, like it's the case for most Lisps and
all Schemes, is homoiconic. This means that there is a correspondence
between the source code and the representation of the program in the
memory wikic2_homoiconic. That is, the source code is a form of
AST, the structure that the interpreter will use to execute the
program.

This makes it very easy to print the program on screen, which is
impossible in many other languages: when a language is compiled,
the structure is lost, and with interpreters, in many cases it
might be possible to restore, but there is also some loss.

A good example for Lisp is that you can generate functions by
building lists and then evaluating them. If the lists are
correctly built, the functions will be evaluated correctly.

Lisp is not the only language that is homoiconic. Prolog, for
example, and Tcl are also homoiconic.

Besides generating source and functions, homoiconicity is handy with
macros, which can be considered a slightly more structured way to use
eval on a list. Syntactically macros are similar to functions, the
main difference is that the arguments of the macros are not evaluated
before the call. It' up to the macro to decide how to evaluate them
and whether they have to be evaluated or not. It can, for example,
explore their structure. That's how it is possible to use macros for
writing a different define, which would do something differently.

In my version of Scheme macros are not implemented. Not because they
are difficult to implement (actually they are easier to implement than
functions, because you don't have to evaluate the arguments). The
main reason is because their behavior is complex and it's more a
Common Lisp feature and not Scheme. But as the syntax of my
interpreter tries to be as close as possible to Scheme, it's difficult
to test whether the behavior is correct. Perhaps in a later
interpreter I'll implement them.

Type System

My implementation of Scheme, as it is the case for other Scheme
implementations and for most Lisp in general, is dynamically and
strongly typed.

Dynamic typing means that the type is checked at runtime. It gives
more flexibility. A variable can contains values of different types
at different times. For example a function can be called with
different types and still work which makes Lisp very suited for
polymorphism. A function can be passes, for example, a list of values
of any types and a function that can compare them and that's enough to
implement a sorting algorithm that would work with lists containing
any types.

Compared to Java, this type system is very easy to use and very
straightforward. Whereas Java requires a very complex interaction
between classes, types, interfaces and methods in order to implement
similar functionality. And as it is shown in my implementation, it is
really easy to implement.

Another advantage is that variables don't need to be declared with the
type. The type is dynamically assigned and maintained. The
variables, on the other hand, need to be declared explicitly, which
makes Lisp somewhat stricter that several other languages.

Some implementation of Lisp, like Common Lisp, for example, require
explicit mention of cases when a declared variable is not used. I
think it's an interesting idea. It is also the case in Rust and in
Prolog. But as a tradeoff between several tendencies for the
specification of Common Lisp, it is not very consistent: there is
still a way to have optional function parameters, that can have a
default value specified by the user in case the function is called
without these parameters.

Lisp is not only a dynamically typed language, it's a dynamically
typed language with strong typing. It means that the type of the
value is checked every time it is used. Different operations require
different types. For example, arithmetic operations require numbers
and cannot work with anything else, whereas car and cdr only work
with conses, and it produces an error if you try to use them with
other types of variables. This is the opposite of what we see in
JavaScript, and it seems it has been recognized that it was a design
mistake. We see that it's not a bad thing when the language itself
helps the developer to write code that is more strict, like we see in
Prolog, Rust and Common Lisp, where unused variables are not allowed.

Sometimes Lisp is described as weakly typed. And the reason for this
is that the notion of strong and weak typing is not precisely defined.
In 1974, for example, Liskov and Zilles considered that the language
is strongly typed if a function only accepts the authorized types.
Today Lisp is called strongly typed because even if it's only at
runtime, types are always checked and unauthorized types are not
allowed.

Another thing that makes the type system of Lisp not that strong, is
the fact that several types are based of conses. Conses are like
primitive blocks, and allow to build any data structure. It can be a
list or a tree or whatever the programmer thinks it to be. But to the
compiler or interpreter it all looks the same: just a cons with a
car value and a cdr value, which themselves can hold any possible
value.

Now the situation is a bit better. In modern implementations of Lisp
the user is allowed to define types. They are usually objects or
records, and can be checked for the type at runtime, so it's possible
to embed complex data structures like custom lists or trees into these
types, so now, not every complex data structure is a cons.

In my implementation, in order to keep everything simple and stupid,
there is no such thing as a record or an object. The only way to link
data structures is by using conses. So my Scheme is not very strongly
typed.

Lexical Scope and Dynamic Scope

Many programming languages have the concept of variable. A variable
is usually something that represents a value and is referred to by
name. That means, that one of duties of programming languages is the
management of the names. Variables are not the only entities in a
program that have names, for example, modules, classes, types,
structures, subroutines, and whatever is needed by the language:
languages are free to add more kinds of entities if they need to. But
here, the focus is specifically on the names of variables and
functions.

For this purpose, there exists the notion of scope. The scope means
the space where a certain name references a certain entity. When the
scope of a variable ends, we say it goes out of scope. Which means,
that if its name is used again, it either refers to a different
variable or its usage is illegal.

One way or another names need to be resolved. Typically, it happens
from the current point in the program, where the name to be resolved
occurs, and the search is made from there. And in order to do so,
there exist two main strategies: lexical scoping and dynamic scoping.

What they both have in common, is that they both search from the name
by looking further in the extent of their scope. If the variable is
not defined in the current block, then they need to go further, and
it's where they become different. Lexical scope looks for the name
based on the structure of the source code, or of abstract syntax tree,
which is structurally equivalent (as it is a representation of the
source code). Dynamic scope, on the other hand, uses the execution
stack in order to find the value. It first looks in the names of the
caller, then the caller of the caller, and so on until the value is
found.

Both strategies have interesting properties, and both have advantages
and disadvantages. So here I will describe them in more detail.

Lexical Scope

Lexical scope, as its name implies, uses lexical structure of the
program, and it means that the way how the variables are resolved is
not dependent on the contents of the stack. It can be easily said at
compile time: in order to find the value of this variable, such and
such operations need to be performed.

It means that it can be resolved at compile time and this is typically
when the code to access the variable is generated. This makes the
code faster, because the information about the names of the
identifiers can disappear, since it is no longer needed: what matters
is the path to access the value.

At the same time, there arise some implementation difficulties related
to the lexical scope. It is rather a difficult problem to generate
compiled code for a language that supports first-class functions.
Indeed, when a pointer to a function is created, one can say that it's
a function object: it contains not only code that needs to be
executed, but also variable associated with this piece of code, not
only local variables, which are created each time the function is
executed, but also non-local variables that reference the enclosing
environment. It is necessary to be able to keep references to them,
be able to read them and to modify.

There can be several function objects and even several copies of the
same object in the program at the same time. In this case it becomes
impossible to know how long these environments are needed. For this
reason languages with such functions typically require garbage
collection.

From some point of view, everything is garbage collected. For
example, when a function returns, its local variables and parameters
are garbage collected by popping from the stack. Only in the case of
function objects, this simple strategy doesn't work anymore and a
more sophisticated garbage collector is needed.

One interesting property of lexical scope is that it makes it easier
to read code. The variable referenced by name is either in current
block, or in the enclosing block, or in the block above. The fact
that it's independent of the execution state of the program, makes it
possible to read code as if it were declarative programming.

For example, this is a classical function map as it is implemented
in scheme. It takes a list l and a function f and produces a new
list by applying every element of l, obtained by the function car
so: (car l) and putting the elements together in the same order.
It's the function cons that combines an element and the rest of the
list returned by the function cdr in this fashion: (cdr l).

(define (map l f)
  (cond ((null? l) '())
        (#t (cons (f (car l)) (map (cdr l) f)))))
Enter fullscreen mode Exit fullscreen mode

As we can see, the code is very concise and we clearly see which
variables are used and where they are defined. There is no way a
variable will come from outside that is not declared. This code is
self-contained.

Now let's see where lexical scope really shines: we'll use this map
function with a closure:

(define (map-mult l n)
 (writeln (map l (lambda (x) (* x n)))))

(map-mult '(1 2 3 4) 10)
Enter fullscreen mode Exit fullscreen mode

We clearly see what variable refers to. No need to search further.
Even if we don't know what map does, we still see what n is and
what x is.

Lexical scope also preserves the referential transparency. It means
that when it's possible to substitute an expression with an equivalent
expression and the behavior of the program will not change. At least
it's the case if the functions have no side effects. This way a
function call with parameters can be substituted with the value it
returns without modifying the behavior of the program. It also allows
to perform memoization, that is, to remember the value that a function
returns with given arguments, and, instead of calling it again another
time, to return the value it returned the first time.

So here is an example in order to illustrate what referential
transparency means. Let's look at an example of a power function,
pow:

(defun pow (n e)
  (cond ((= e 0) 1)
        ((= e 1) n)
        (t (* (pow n (- e 1))
              n))))
Enter fullscreen mode Exit fullscreen mode

Here the function power is defined by multiplying e times the number
n. But if we define the function power differently, as being
n^e = ((n^2)^d)*n^r, where e = 2*d + r and r is either (0) or
(1), it possible to write another function that uses less
multiplications:

(defun pow2 (n e)
  (cond ((= e 0) 1)
        ((= e 1) n)
        (t (let ((r (mod e 2))
                 (d (/ e 2)))
             (* (pow2 (* n n) d)
                (pow2 n r))))))
Enter fullscreen mode Exit fullscreen mode

Here we use the property of the function * that it's referentially
transparent: we can achieve the same result using a different
definition of the function by using this function less times. If the
* function were referentially opaque, it probably wouldn't have
worked, at least it would have been difficult to be sure that the
function works correctly.

By the way, this code is written in Emacs Lisp, which is dynamically
scoped. This example shows that if we are careful, even in
dynamically scoped languages it's possible to write referentially
transparent code. Here of course it's not that difficult because *
is a built-in function. For more complicated cases dynamical scope
can become a more serious problem.

Lexical scope also facilitates code optimization: referentially
transparent code can be easier analyzed and modified by the compiler
in a guaranteed way of preserving the intended behavior of the
program.

So, these properties make it easier to reason about the code:
the code can be subdivided into parts because we know exactly what
each name refers to, and referential transparency gives the
possibility to read code as a set of distinct expressions with a
precise meaning.

Dynamic Scope

The other important strategy of scoping is dynamic scope. Its main
characteristic, which makes it different from lexical scope is the way
the names for the entities are searched. When a dynamically scoped
name needs to be resolved, the search starts from the block of code
where the name occurs. At this point it is not different from lexical
scope. And if code is written carefully enough in many cases it's
possible to write portable code between lexically and dynamically
scoped versions of the same language.

The difference with the lexical scope starts when the name is not
found in current block. Whereas lexical scope searches outside, in
the lexical environment, dynamic scope looks in the most recently
defined names, which are in scope.

Another important difference is that the scope ends with the block
that defined the variable or has given a value to the variable. That
means, that a value is valid as long as the block has not finished
executing. The functions called have the right to replace the value
with their own value temporarily, as long as they execute. And as
soon as the block finishes executing, the value of the variable is
lost and the previous value, if it exists, is restored.

One interesting property of dynamic scope is that a name reference can
refer to variables that are lexically defined in different places.
The function called cannot know which variables it references. It
depends on the execution state of the program.

Dynamic scope has a big disadvantage. The code that relies heavily on
it is difficult to read, to modify, and to refactor. The reason is
that when dynamically scoped variables are referenced, several
portions of code become interdependent and if one wants to isolate
some code into a module or a function, there is no way to know which
other functions use these variables.

And also, there is no referential transparency, which makes it
difficult to subdivide the code in pieces and to reason about each
part locally. As the result, the code can become unmaintainable.

In the past, much more languages were using dynamic scope than now.
The first Lisp, for example, used dynamic scope. It's after the first
implementations that it became apparent that the behavior of the
dynamic scope is not something that was expected while reading the
source code. Moreover, lambda calculus, which had a lot of influence
on Lisp used lexical scoping: variables could be bound with variables
outside the current function.

So here is an example that compares lexical and dynamic scopes:

(declare a 1)
(defun fn (b)
  (list a b))
(let ((a 2))
  (fn 3))
Enter fullscreen mode Exit fullscreen mode

This code evaluates to (2 3). As you can see, when fn is called
the a "parameter" is set to (2) and overwrites the (1) value.

Let's look at a similar example in Scheme:

(define a 1)
(define (fn b)
  (list a b))
(let ((a 2))
  (fn 3))
Enter fullscreen mode Exit fullscreen mode

The result of evaluation of this code is (1 3): we use the first a
that is captured by fn. The other a, which is in let, is a local
variable and seen from lexical point of view has no relation to the
function.

So why can dynamic scope cause a problem? In the first example we
have no idea what a is. It can really be anything: a number, a
function, a pair… This is why it is so hard to refactor dynamically
scoped code: we need to inspect and test every instance where a
appears, and only then we can decide what we can do with the function
fn.

Dynamic scope makes it also impossible to use closures: when the name
of the variable is looked up, its enclosing blocks are simply ignored.
As for higher-order functions, there are also similar problems.

On the other hand, while not very suited for big projects and despite
difficulties of refactoring, dynamic scope has its place for some
special uses. For example, it's useful for temporarily setting some
global parameters that one or several functions should access. When
the current block ends, all the bindings are undone, but before that
these parameters were accessible by the functions called. One example
of such use is when you have to write into a file with a specific
format of date, but once you have finished writing, you don't need
this format anymore. But as long as the function that writes to the
file is alive, all the functions that need to write a date, under the
dynamic scope of that function, will know which format to use.

Another important use of dynamic scope is backtracking. When testing
different possibilities, some settings can be set and the outcome can
be tested. On failure, other setting can be set and the function that
calculates the outcome can be called once again, and so on.

Dynamic scope is often used for exception handling. Indeed, the
handlers defined in the block of the code are in many languages
dynamically scoped. These handlers are typically defined as blocks of
code, often called catch blocks. In Java, for example, there is a
try block, which contains calls that might throw exceptions and
there are catch blocks that describe steps that need to be done when
an exception occurs. The code inside the catch blocks is called the
exception handler and can be viewed as a function that is executed
when the exception assigned to them is thrown. The actual code that
throws the exception can actually be far from the try block and the
catch blocks, but nevertheless, to find the handler that needs to be
executed, the rules of dynamic scope are followed.

Conclusion about Lexical and Dynamic Scope

So in the end we have seen what scope is and how it works. We have
also described two main strategies of resolving names when names are
in the current block. Even though it might seem that the lexical
scope has won and is much more appropriate for programming, dynamic
scope is still very important, and many languages do not even attempt
to get rid of it. Several languages even support both. This shows
the importance of the dynamic scope. So here are some examples of
languages that use both:

  • Common Lisp, is usually lexically scoped, but supports dynamically scoped variables, which are called special in Common Lisp.
  • Java and many languages that use exception handlers: the exception handlers are typically associated with exceptions using dynamic scope.
  • C macros: when the macros contain free variables, they are evaluated in dynamic scope. They work by substituting the macro names in the code, so it's not truly dynamic scope in the sense that it does not look the name up in the current execution stack. But when these free variable are in the scope where the macro is used, they behave as though they were dynamically scoped.
  • Scheme also, even though it is lexically scoped almost everywhere, uses dynamic scope in some input and output routines, like for example, in with-output-to file.
  • Perl, a language that was dynamically scoped when it was created, added later lexical scope, so now it's up to the programmer to decide which scoping strategy to use.

It is also interesting to point out that lexical and dynamic scopes are
not the only possibilities to describe scoping rules, nor are
sufficient to characterize the way variables are resolved by the
language. A good example is the difference between labels in many
assembly languages, which do not need forward declarations and can be
used before being declared, and variables in C, which need to be
declared before being used. In that sense, each variable in C
introduces its own scope.

There can also be limitations to the lexical scope, for example in
Java object closures allow only final variables, which means that
other variables are in a somewhat more restricted scope. More of such
examples are described by Eric Tanter in Beyond Static and Dynamic
Scope
tanter_beyond.

First-class Functions and Closures

What is being First-class Citizen?

Another important feature in my implementation, that also usually
exists in implementations of Lisp is the first-class functions. It
means that functions, along with other types, like integers,
characters, or booleans, are first-class citizens in a given
programming language.

What it concretely means, is that it has several important properties.
It can be passed as argument to a function. In order to do this, the
name of the function should not only be usable only for calling the
function, it should also represent the function as a variable that can
be thought of as having a specific value. Or perhaps there can be
other ways to assign a function value to a variable, for example,
using lambda functions.

Another important property of first-class citizens, is that they can
be returned from a function as a return value. It is certainly the
case for Scheme and Common Lisp. Everything that can be manipulated
as a variable with a name can be returned. It is because Lisp has
dynamic type system and it does not know the type of the value it
returns. So if you have a way to make a variable reference something,
you have to have to be able to return it.

The third property that is necessary for being first-class citizen, is
to be able to be assigned. Here also, as Lisp is rather a simple
language, if you have a value, you are free to assign it to anything
you want. Any expression can be made to be evaluated and assigned to
a variable.

And the last property for being considered a first-class citizen is it
has to be possible to tell whether two functions are equal. It is
rather difficult because first, function equality can have several
meanings. For example, if they have the same output for same input,
or, whether their internal structure is the same, whether they have
the same address in memory.

Also the execution time can be taken into account. It is also
difficult to tell whether they are equal because to test it would be
rather costly (except for testing the memory address, which is not a
good way). Also, in general, it is impossible to tell whether the
functions give the same output only by examining their source code. I
think in practice it is not often used. And languages which do not
provide a dynamic way to build function like C, provide a more or
less good approximation of equality by just comparing the memory
address. In case of Java, there is the comparison of the type of the
objects: very often functions are put inside a class, which makes the
equivalent to functions. Good examples of such classes are
Comparators and Listeners.

Higher-order Functions

An interesting feature of Lisp and many functional languages is the
support of higher-order functions. A higher-order function is a
function that accepts other functions as parameters, and / or that can
return a function as a return value. Both cases are slightly
different and are used differently. When a function is passed, it is
to be executed by the accepting function, so it is supported by
languages like C, which cannot capture the environment with closures.
When returning, on the other hand, C is very limited: it can only
return a memory address of a function, it cannot create a new
function, which is possible in functional programming languages. So,
it can be said, that only a half of the higher-order function feature
is supported by C.

The usage of higher-order functions gives a lot of expressiveness and
flexibility to the language. It allows to abstract some actions and
concepts. Higher-order functions can be used, for example, to sort
differently a list depending on the comparison criteria. By passing
different functions to the sorting function, it's possible to make it
sort differently.

Here are some examples of common usage of higher-order functions:

  • The map function is used to either transform a list or creating a new list by applying a function to every element.
  • The filter function allows to create a new list based on an existing list by using only elements that are tested positive by a function supplied by the user
  • The fold function processes the whole list and creates a single value from that list. It can be sum, average, maximum, minimum, count or anything the user desires to do.
  • Another example is to perform an action the way a user wants it. For example, it can be an equal function that is passed to a function that checks whether an element is in a list or not. Not all equals are the same, and sometimes it's very useful for the user to be able to tell exactly when the two values are considered equal.

And here is an example of code of a higher-order function:

(define (fold-if list fold-fun filter-fun base-val)
  (cond ((null? list) base-val)
        ((filter-fun (car list))
         (fold-fun (car list)
                   (fold-if (cdr list) fold-fun filter-fun base-val)))
        (else
         (fold-if (cdr list) fold-fun filter-fun base-val))))
Enter fullscreen mode Exit fullscreen mode

The function in this examples, fold-if takes as arguments a list, a
fold function, a filter function and a base value. So it folds only
values that return true when passed to the filter function.

Here is an example of this function being used:

(fold-if '(1 2 3 4 5 6 7 8 9 10)
         *
         (lambda (x) (= (modulo x 3) 0))
         1)
Enter fullscreen mode Exit fullscreen mode

What it means is that given a list of 10 numbers, it applies the
multiplication function, starting from 1 to all members that are
divisible by 3. So its basically the same as making (* 3 6 9),
which returns 162.

Anonymous Functions

Another important feature of Lisp and of Scheme, is support of
anonymous functions, also known as lambda functions.

What it means is that they are just functions, but without a name.
The way of calling them lambda functions comes from lambda calculus,
which is system that abstracts functions. The only object in lambda
calculus is a function and the only operation allowed is the function
application. Functions have no names.

The way the functions are used in lambda calculus gives and example
how anonymous functions can be used in programming languages: by
passing them as parameters.

Indeed, if the language does not support first-class functions,
anonymous functions can seem to be a completely useless concept: what
is the use of creating something you have no way to refer to?

Anyway, the concept of lambda functions was born before the first
programming language was made, and it is also the reason why Lisp, the
first language to use anonymous functions, used the lambda keyword to
refer to them.

But even if they are not supported, it's still possible to get their
full functionality by just creating normal named functions and to pass
them. Here lies another problem: the language has to support the
definition of local functions because lambda functions are typically
used in a local context, which can reference local variables (which
are, by the way, non-local to the lambda function). So by simply
passing a top-level function will not provide the same functionality.

Another way anonymous functions are often used, is to return a
function as a function's return value. This way it's possible to
write a function that acts as a function factory, which creates
different functions based on arguments passed to it. This is
something not possible to do with just function pointers, because they
don't capture the environment.

So, here is a simple example of usage of an anonymous function in Scheme:

(define (fn f c)
  (+ (f 5) c))

(fn (lambda (x) (* x 100)) 3)
Enter fullscreen mode Exit fullscreen mode

It the code creates an anonymous function that simply multiplies its
argument by 100. The fn function calls it with value 5 and adds to
the second argument. So the result is (503 = (100 * 5) + 3).

Closures

Another important concept are the closures. A closure is when there
is an internal function that has non-local free variables, which are
bound by the enclosing functions or blocks. It's not necessarily the
closest enclosing block, it can also be a block several levels higher.
The word closure comes from the fact that the free variables are
closed by the surrounding block.

Here is an example of a closure in Pascal:

program test01;

function fn(c: integer): integer;
    function infn(x: integer): integer;
    begin
        infn := x * 100 + c
    end;
begin
    fn := infn(5)
end;

begin
    writeln(fn(3))
end.
Enter fullscreen mode Exit fullscreen mode

This code uses an internal function infn that can access an external
variable c, which is a parameter of the fn function. It works
because the code does not need to create function objects. It only
needs to access the frame of the enclosing function on the stack.

Closures are typically associated with first-class functions, because
they are often used to return a function, which is a recurring pattern
in functional programming. But they can also be used in language's
without first class functions. For example in C, the global
environment can be seen as a closure over functions, because functions
can use global variables defined in it. There is also a similar
concept in Pascal, which supports nested function, which are functions
inside functions. Function on an inner level can use the variables of
the function on the outer level, and thus, the outer level creates a
closure. However, internal functions cannot be returned because when
returning such functions, their environment has to be somehow
preserved, which is not easy to implement in a language without
garbage collection.

Closures are often used for callbacks. That is, when the user
specifies a function to be called in certain circumstances: when an
exception, an interrupt, or an event occurs.

With closures is associated the funarg problem. When a function is
returned, as a value, from another function, it can refer to non-local
values. The problem is that all the function that are created in this
manner must be able to share values if they refer to the same value.
For example if there are different closures in the same environment,
or the user makes a copy of a closure. In this case in order to know
when to free the environments no longer in use, the language (or its
runtime) needs a garbage collector.

Here is an example of a higher order function that returns a function:

(define (make-multiplier n)
  (lambda (x)
    (* x n)))
Enter fullscreen mode Exit fullscreen mode

This function needs to save within itself the parameter passed, n.
And with another parameter it can create different functions. It can
be used in the following way:

(let ((mul5 (make-multiplier 5))
      (mul2 (make-multiplier 2)))
  (writeln (+ (mul5 6) (mul2 4))))
Enter fullscreen mode Exit fullscreen mode

We create two functions: mul5 and mul2. mul5 returns the
argument multiplied by 5 and mul2 multiplies by 2.

So when this program is executed, it prints (38), which is ((5 * 6) +
(2 * 4)).

Tail-call Optimization

A Scheme compiler or interpreter cannot use the name Scheme if it does
not perform tail call optimization. It how it's written in the
specification. Therefore my interpreter, even if it's really
minimalist and implements a very small subset of the Scheme
programming language, in order to be called Scheme implements some
rudimentary kind of tail call optimization.

What is the meaning of this term? When a function calls another
function, it does so by putting some values on the top of the stack.
At least it must be the return value, and often it's also function
parameters.

When a function before returning calls another function, an thus uses
its result as the result of the current function, it can be avoided to
use more of the stack space. The purpose is to make as though the
caller called one function, but another function returned. It is
usually done very easily, if the calling conventions are not too
complicated or badly designed. What needs to be done is to make the
last function called believe that it must return to the place the
current function must return. So, instead of pushing a new value to
the stack it doesn't do it and jumps directly to the called function.
The details of what exactly must be done are dependent on the calling
convention, of course, like the way the parameters are passed and so
on, but the idea is always the same: perform the return as though the
caller called not the current function, but the last function.

This idea is not dependent on the idea that the stack must be there.
If we some day find that the stack is outdated and a much better way
exists to implement function calls, the idea of tail call optimization
might still work.

Functionality

So now I am going to say a few words about the functionality of
PowerScheme: what it is, what it's designed to do, and a few of its
properties that make it different from usual Scheme implementations.
Sometimes I included a feature that I thought would be interesting,
sometimes I wanted to include something, but then realized that it
would be too complicated to code or not practical to use.

PowerScheme is a Scheme interpreter

One of the most obvious things is that, as a beginner, I don't start
with difficult things from the start. I try to keep everything
manageable, and when, in the future, I will know better about how things
work, I will add some more advanced techniques and features. For this
reason my Scheme implementation is an interpreter. Interpreters are
typically easier to code than compilers, because compilers require
knowledge about the target architecture, optimization techniques and
so on.

My interpreter can run in two different modes. The mode is determined
when the program starts and cannot be switched at runtime.

Interactive mode

The first mode is the interactive mode. PowerScheme is started in
this mode when the interpreter is invoked without arguments. That
means that it works like a REPL: the program waits for the user to
input lines of code (it doesn't do anything when the user inputs
individual characters), and then looks if it can be interpreted. If
the input contains a complete s-expression or a single value, it
interprets it and prints the result value on the screen. In some
cases, when there is an error condition during the execution of the
program, instead of printing a return value, the interpreter prints an
error message. This is done by the way of the support of Exceptions
by PowerShell.

So here is how it's started. From the directory containing the source
code of the interpreter, simply invoke PowerShell by typing pwsh
-File PwScheme.ps1
:

$ pwsh -File PwScheme.ps1
PowerScheme>(writeln "Hello, World!")
"Hello, World!"
PowerScheme>_
Enter fullscreen mode Exit fullscreen mode

Now the interpreter can be used in interactive mode. In order to exit
there is a specialized built-in function exit:

PowerScheme>(exit)
$ _
Enter fullscreen mode Exit fullscreen mode

Non-interactive mode

The second mode is the non-interactive mode. In order to enter this
mode the program must be started with an argument which is interpreted
as a file name. The file referenced by the file name must contain a
program that can be interpreted by PowerScheme. This program is then
read and executed. When PowerScheme finishes the execution of the
program, it exits without doing anything.

And here is an example running in non-interactive mode, by reading the
Scheme source code from a file in tests directory named
y-combinator.scm. As you can see, after executing the program, and
printing what the program has told it to print, it exits immediately.

$ pwsh -File PwScheme.ps1 tests/y-combinator.scm
6
3628800
$ _
Enter fullscreen mode Exit fullscreen mode

Lexical and Dynamic Scope in PowerScheme

Another decision I had to make was regarding the scoping rules. There
are two main ways the variables are looked up: the lexical scope and
the dynamic scope. My previous interpreter, that I wrote in Kotlin,
was exclusively lexically scoped. For this one, as I decided to write
it as an exercise for the chapter 2 of Lisp in Small Pieces, which
discussed the implementation of the dynamic scope, and I think it is
not discussed a lot further in the book, I decided to implement both
dynamic and lexical scope.

Scheme by its definition must be lexically scoped. When it was
created, it was considered to be a foreign feature that came from
Algol. So, anyway, it was not a good thing to leave lexical scope
completely out. That's why I decided to do both. In the book there
were shown implementations that used specialized keywords for
declaration and accessing of the dynamic scope. Now I think it's a
good idea. It should be weird to want to modify a variable and then
find out that it's actually dynamically scoped. For this reason, I
think, there is a convention in Common Lisp, in order to distinguish
dynamic variable from the rest, to prepend and append an asterisk
before and after the name of the variable.

I did more or less the way it's done in Common Lisp: the way dynamic
variables are accessed is the same way lexical variables are. For the
declaration, both of variables and functions, I use the keyword
dynamic. This can lead to problems if the user tries to use the
same name for a lexical and a dynamic variable. A better solution
would be to check systematically that they don't overlap, or to impose
a distinction in the name, like allowing only names between asterisks
to be legal for dynamic names. This would actually lead to confusion
because in Common Lisp such variables are all in global scope, but in
my implementation they can be defined at any level.

Here is a little program I have written in order to test how my
implementation of the dynamic scope works:

(dynamic y 1)
(define (f x)
 (writeln (list 'f y))
 (set! y (* x 10))
 (writeln (list 'f y))
 (g 7)
 (writeln (list 'f y)))
(define (g a)
 (writeln (list 'g y))
 (set! y a)
 (writeln (list 'g y)))

(writeln (list 'top y))
(set! y 3)
(writeln (list 'top y))
(f 5)
(writeln (list 'top y))
Enter fullscreen mode Exit fullscreen mode

It prints the following, showing that even if functions modify the
variable y, they do it only in their scope and the previous value is
restored when they return:

(TOP 1)
(TOP 3)
(F 3)
(F 50)
(G 50)
(G 7)
(F 50)
(TOP 3)
Enter fullscreen mode Exit fullscreen mode

Exception Handling

Another use of the dynamic scope is for handling exceptions. So, I
added in PowerScheme a very rudimentary exception handling feature in
order to test how it relates to the dynamic scope. So let's say that
each error condition has a dynamically scoped function associated with
it. They can easily be defined by user, for example this way:

(dynamic (*zero-crasher*) (writeln '*zero-crasher*))
Enter fullscreen mode Exit fullscreen mode

The syntax of dynamic is like that of define, but the declarations
are made in the dynamic environment, thus creating dynamic variables
and functions.

Let's say it is associated with the condition when something tries to
divide by zero. It is used this way:

(define (div a b) (if (= b 0) (crash (*zero-crasher*)) (/ a b)))
Enter fullscreen mode Exit fullscreen mode

When we encounter a b which is zero we crash the program by
executing the crasher. The special form crash evaluates its
argument (here it executes the function that I call the crasher) and
crashes the program.

(div 6 0)
Enter fullscreen mode Exit fullscreen mode

This, for example, prints *ZERO-CRASHER* and crashes.

This error handler works globally until it's overwritten:

(let ((a 5)
      (b 0))
 (set! *zero-crasher* (lambda () (writeln 'test:from-let)))
 (div a b))
Enter fullscreen mode Exit fullscreen mode

Now we can specify the handler we want. So here, in this particular
dynamic scope (this let), we have our own error handler, our own
crasher!!

Of course its usage is limited and is not enough powerful for many use
cases. I think it can be possible to implement it so that the program
still can continue the execution. But even if I implement it there
is a problem. This kind of exception handler seems to become an
anti-pattern. When stack-unwinding exception handlers are used, the
code becomes error-prone and difficult to reason about. This is the
reason why I didn't implement it completely, just a minimum subset in
order to see the dynamic scope in action. There is also another
problem, even if it somehow can be solved: there needs to be a way to
jump from the handler through the stack to the place where the handler
has been defined. It's actually difficult to solve without rewriting
a lot of things. A possible strategy would be to return the value,
but it has to be checked. Scheme values are objects in my
implementation anyway, so an integer value, for example, (0) can
signify that an exception has been thrown and needs to be caught.
This situation similar to the situation with continuations: it
something that has to be specified in the requirements from the start.
And as it is a buggy feature even if done right I didn't implement it.
On the other hand, in a Scheme with continuations it is not difficult
to implement and can be an interesting toy to play with.

Implementation Details

In this part I will describe some of the implementation decisions I
have made for this interpreter. I will not describe every detail of
the implementation, but I will focus on the parts, where I had to make
a choice, or when I had to find a solution to a specific problem.

So, I will start from the very beginning, when the program is first
invoked: how the program mode of execution is chosen, and so on.

Then I'll describe how the input is read. First it goes through the
tokenizer, which leaves only the essential elements of the program,
then there is the parser, which attempts to find a structure in the
program, then this structure is evaluated.

The evaluation also is a rather complex part of the interpreter, and
its various part will also need to be described. The various
implementation decisions which I made, like the way function calls are
made, how the environment works, and so on will be given some
attention in this section.

The Structure of the Code

So a good way to describe the structure of the code is to show in what
files it is subdivided. This is a rather simple program and it only
has seven source files:

  • PwScheme.ps1 is the entry point of the program. It executes in REPL mode, or in the case an argument is given, reads the code from a file.
  • Tokenizer.ps1: produces tokens from the output.
  • Bag.ps1. Before giving tokens to the parser, its better for it to have complete expressions, so that further steps can be executed without delay. So the Bag provides an iterator that only gives things that can be interpreted as Lisp objects: either primitive values or pairs/lists.
  • Parser.ps1. Creates the structural representation of the code.
  • Evaluator.ps1: transforms the code into the executing program. As it is complex enough by itself, previous steps are needed, so it can do its work without worrying about steps not directly related to it.
  • Environment.ps1 represents the environment of execution. There are two environments: lexical and dynamic. And they don't work the same way. But I tried to put them in one program unit nevertheless because their functionality is somehow similar.
  • System.ps1. It's a helper unit that is needed for the interaction with the OS. It also contains some predefined functions, like multiplication or modulo, which could be expressed in the language, but it's much more convenient if I use functions provided by PowerShell.

The Entry Point

The file PwScheme.ps1 contains the entry point. Its job is to
start everything. If you look at the code, you will see that it
is divided into two parts. The first part corresponds to the case in
which it's invoked with a single argument, in which case it reads a
file. The first part is when it's in interactive mode, so its job is
to show a prompt and execute the commands as the user enters them.

As you can see, the Bag is only needed in the second case. It's
because it needs to know whether there is enough input so that it can
be evaluated. When reading from a file, this problem does not occur,
because the program can just read until the end of the file and if the
expression is unfinished or illegal, it cannot be changed during the
period of execution of the program.

Another interesting point is how the exit function sends an exception
to the REPL in order to tell it that it wants the program to
terminate. So, yes, I use an ExitException. I don't know how
people at PowerShell look at it, but in some coding communities it's
regarded as a goto use of an exception, which is not welcome. But
there is no other goto mechanism in PowerShell, and this use of
goto is one of the cases, where doing otherwise would make code less
efficient and more difficult to read. I would need to return a
special value and every function down on the stack would have to check
whether I want to exit or not…

The Tokenizer

One of the things that is very simple and generic in my interpreter is
the tokenizer. It defines a Token class, which is a specialized
class for creating tokens. It reads the input, removes the comments,
interprets primitive values (like characters, strings, numbers), and
doesn't attempt to create any kind of structure. Instead, when it
encounters a special character, like a parenthesis, it creates a token
for this character.

The Bag

The Bag is also a very simple unit. All it does, is it makes sure
that when the Parser tries to read the input, it gets something with
a balanced number of parentheses. This is the reason why it comes
after the tokenizer: it needs to know whether a parenthesis is inside
a literal (a string or a character) or if it's really a real
parenthesis that's there to give the structure to the code.

The Parser

Then comes the Parser. Even though many people say that parsing is
still an unsolved problem, for some simple tasks, simple and effective
parsers can be done. This is certainly the case for Lisp. I have a
suspicion that it looks the way it looks today in order to make the
parsing problem easier. A lot of the development of the parsing
techniques came when Lisp already looked the way it looks now.

So my Parser only has one kind of a non-terminal to parse: the
s-expressions. That is, when it encounters an open parenthesis, it
knows that what it represents, is a start of an s-expression. When it
encounters a closing parenthesis, it's the end of an s-expression.
And there is nothing more to it. A half of the Parser.ps1 file is
actually taken by the definition of the class Exp, which represents
an expression in the AST. Basically everything is an Exp: a
literal, a pair, a function…

The Evaluator

Then it's the job of the evaluator to read the structure that the
parser has produced and to execute (or evaluate, in the language of
Lisp programmers) the expressions. What is interesting is that usually
in programming languages, there are statements and there are
expressions. People say that the statements are executed and
expressions are evaluated. But in Lisp, statements and expressions
are the same thing. Expressions are statements and statements are
expressions. If you think about it, it kind of makes sense, because
statements and expressions are both tasks given to the computer to
perform. So they are closely related. The reason why people decided
to make a separation between both is difficult to understand. Perhaps
it has something to do with the history of how programming languages
were created, like at first everything had to be a statement with only
a few arithmetic expressions here and there or something similar.

Types

There are three kinds of types in my interpreter: primitive types
(number, symbol, string, character, and boolean), pairs (they are
represented by conses), and functions, of which there are three types
(user-defined, built-in, and thunks).

Functions

Functions are an important topic in programming languages, and
particularly in Lisp, which is sometimes even called a functional
programming language. There are many topics related to functions and
a lot different ways to implement them and, also, a lot of things that
are difficult to implement. So here I will present several of the
problems I had and solutions I found while implementing functions in
my PowerScheme interpreter.

First there are, in my interpreter 3 different kinds of functions:
user-defined, built-in and thunks. Now, after having done some
choices and looking around, I see that it's a common thing for a
language to have functions that are of different kinds. For example,
C has functions, macros, some keywords that look like functions, but
are not, like sizeof. Bash has user-defined functions, built-in
functions, and external commands. It's more or less the same
situation in PowerShell. And almost everywhere I see this pattern.

So, some functions are defined by the user. These functions are
kept in the form of the AST nodes. They also save the environment in
which they were defined in order to not lose access to the non-local
variables.

Now I will show how they are created. There is a specialized function
the job of which is to create functions. It's called Make-Function.
It is called from the Evaluate function whenever it encounters a
lambda or a define.

It's interesting that simply by inspecting this one function, you can
tell a lot about how my interpreter works.

function Make-Function($name, $env, $paramsExp, $body) {
    $function = New-Object Fun
    $function.defEnv = $env.Duplicate($name)
    $function.isThunk = $false
    $function.params = @()
    $paramsCons = $paramsExp
    while ($paramsCons.type -eq "Cons") {
        $function.params += $paramsCons.car.value
        $paramsCons = $paramsCons.cdr
    }
    if ($paramsCons.type -eq "Symbol" -and $paramsCons.value -ne "NIL") {
        $function.dotParam = $paramsCons.value
    }
    $function.body = $body
    return New-Object Exp -ArgumentList ([ExpType]::Function), $function
}
Enter fullscreen mode Exit fullscreen mode

It takes as parameters the name of the function, which is here only
for debugging purposes, the environment in which it was defined, the
expression which holds the parameters, in Lisp it is typically a list
of identifiers, and the body, which is executed when the function is
called.

You see that a function is in fact an object. Even though the authors
of PowerShell say that it is not a object-oriented language (they call
it object-based payette_windows_2007), the existence of objects
is rather convenient and makes it easier to write code.

Then the weird thing is that we duplicate the environment. Don't
worry, it's not a deep copy of the environment. It is needed because
the environment itself is an object and if we don't make a copy, its
state will change (for example the scope level), and it's not what is
supposed to happen. Moreover, the Duplicate function makes it
possible for the environment to have branching stacks for each
variable. I will explain it in more detail a little bit later.

Then we mark that it's not a thunk. Because it's just a regular
user-defined function.

Then we walk through the linked list of conses and we append the
elements to the params field. Then we need to take care of the case
of the dotted parameter. It's really complicated and not very well
tested. I wish it wasn't a required feature of Scheme.

Then, at last, we insert the body into the function and we put the
function into an Exp. Because an Exp can be anything, a function
is not an exp, but it can be included in an Exp. Remember, an Exp
is a type that can hold any Lisp value, including functions.

Some functions are built in. These are functions that perform some
input and output (like reading files or writing to the console),
arithmetic functions, type checking functions, and the EXIT
function.

These functions are defined in the System.ps1 file. I would call it
a unit, because I would like to call it a module, but it means
something else in PowerShell.

So now, let's look at some examples:

Let's begin with a simple function whose job is to add two numbers.
In my implementation only integers are supported:

function SysPlus($a) {
    $result = 0
    foreach ($num in $a) {
        $result += $num.value
    }
    return New-Object Exp -ArgumentList ([ExpType]::Number), $result
}
Enter fullscreen mode Exit fullscreen mode

We receive an array with integers, which we have to add together. So
we add them one by one and then enclose them in an Exp of type
Number.

The next function is also easy to understand:

function SysWriteLn($a) {
    foreach ($exp in $a) {
        Write-Host -NoNewline $exp
    }
    Write-Host ""
    return $null
}
Enter fullscreen mode Exit fullscreen mode

It has almost the same structure. It returns a $null, which
signifies the lack of value returned. The user should not expect
functions that are made only for side-effects to have a value.

The next function is a funny one, it's called EVAL:

function SysEval($a) {
    return Evaluate $a[0] (Make-Global-Environment) (New-Object Environment "#<eval>") $false
}
Enter fullscreen mode Exit fullscreen mode

Yes, it calls evaluate from the evaluator and passes the value it
receives to it, with a new global environment though. The second
environment is the dynamic environment, which is empty.

Apply, on the other hand is a little bit larger:

function SysApply($funExp, $argsExp) {
    $function = Evaluate $funExp $env $denv $false
    $env = Make-Global-Environment
    $denv = New-Object Environment "#<apply>"
    if ($function.type -eq "BuiltIn") {
        return Call-BuiltIn $function.value $argsExp $env $denv $false
    } elseif ($function.type -eq "Function") {
        return Invoke $function $argsExp $env $denv $false
    }
    return $null
}
Enter fullscreen mode Exit fullscreen mode

And it uses either Invoke or Call-BuiltIn depending on whether
it's a built-in function or not.

This is my exception-handling function:

function SysCrash($args) {
    Evaluate $args[0] $env $denv $false
    throw [ExitException] "CRASH"
}
Enter fullscreen mode Exit fullscreen mode

I know it looks like cheating because I implement exceptions using
exceptions, but it's rather a difficult thing to implement and
requires more work than simply changing a few things here and there.

And here is the function used to read from standard input. It
displays a prompt and waits for a string followed a newline:

function SysRead($a, $env, $denv) {
    if ($a.length -eq 1) {
        $msg = $a[0].value
    } else {
        $msg = ">"
    }
    $text = Read-Host $msg
    $tokens = Get-Tokens $text
    $exps = Parse-Tokens $tokens
    $exps | ForEach-Object {
        try {
            $exp = Evaluate $_ $env $denv $false
        } catch [EvaluatorException] {
            Write-Output ("Exception in SysRead: " + $($_.Exception.msg))
        }
    }
    return $exp
}
Enter fullscreen mode Exit fullscreen mode

This function needs to first read from input, then go through the
steps of tokenization, parsing and evaluation. Then there is a loop
in case the user decided to put several expressions on the same line.
The return value is the last expression evaluated.

Next, is an example of a function that checks whether the type of a
value is a pair. Here you can see how boolean values are put in the
Exp:

function SysIsPair($a) {
    return New-Object Exp -ArgumentList ([ExpType]::Boolean), ($a[0].type -eq "Pair")
}
Enter fullscreen mode Exit fullscreen mode

And here is the EXIT function. Of course there is nothing
surprising about it.

function SysExit() {
    throw [ExitException] "EXIT"
}
Enter fullscreen mode Exit fullscreen mode

Local Functions and Thunks

Scheme has a construct to define recursive functions so that they be
local, it's called letrec. Here is an example that illustrates how
it works:

(letrec ((odd
           (lambda (lst)
            (cond
             ((empty? lst) empty)
             (else (cons (car lst) (even (cdr lst)))))))
         (even
           (lambda (lst)
            (cond
             ((empty? lst) empty)
             (else (odd (cdr lst)))))))
  (writeln (even '(1 2 3 4 5 6)))
  (writeln (odd '(1 2 3 4 5 6))))
Enter fullscreen mode Exit fullscreen mode

The output of this expression is:

(2 4 6)
(1 3 5)
Enter fullscreen mode Exit fullscreen mode

So it defines two local functions that are defined using each other,
if that makes sense. There is a problem regarding the implementation
of letrec. It's that let expressions are evaluated in their
environment and don't have access to other things that exist inside a
let.

Here we see, that both of these local functions want to access
another, which means, both functions have to know about each other.

So, what to do? A simple solution would be to define an environment
for the let parameters, add these names with a temporary value to be
overwritten, and then assign the lambda functions to these variables.
This way, when they are defined, they are in an environment visible to
these local functions.

This strategy seems to work as it is more or less fine. But,
unfortunately I have found a problem with this approach. Let look at
the following code:

(letrec ((f (lambda (x) (+ (g (+ x 1)) 1)))
         (x (g 4))
         (g (lambda (x) (* x 10))))
  (writeln (+ x (f 7))))
Enter fullscreen mode Exit fullscreen mode

Indeed, one of the parameters, x is not a function, but it depends
on the existence of the function g nevertheless! What happens, is
that the moment values are assigned the function g is not available
yet, but the value of x should be defined: it's (40). And the value
of the whole expression is (+ 40 (+ (g 8) 1)) = (+ 40 81) = 121.

How can we make it work? It's actually easy. We have seen that it
works with functions, so that means if we transform (g 4) into a
function, it will work. So how do we do that? It's also actually
easy: (lambda () (g 4)), and nothing more is needed. This kind of
function is called a thunk. It literately means that the program has
to have a little thinking session in order to set the value. That's
why it's called a thunk.

This is how a thunk is created:

function Make-Thunk($exp, $env) {
    $function = New-Object Fun
    $function.defEnv = $env
    $function.params = @()
    $function.isThunk = $true
    $nil = New-Object Exp -ArgumentList ([ExpType]::Symbol), "NIL"
    $cons = New-Object Exp -ArgumentList ([ExpType]::Cons), $exp, $nil
    $function.body = $cons
    return New-Object Exp -ArgumentList ([ExpType]::Function), $function
}
Enter fullscreen mode Exit fullscreen mode

So it's a function evaluated in the lexical environment, without
parameters, marked as a thunk with the only element of the body being
the corresponding expression, then it is enclosed in an Exp and
returned.

But actually with this approach there is another problem. If we
simply transform it into a function, the references will refer to a
function and will not work anymore: an expression like (+ x 10) is
meaningless if x is a function. In this case it should be
transformed into (+ (x) 10).

For this reason a special kind of functions is needed, that I called a
thunk in my implementation (in other uses this term can have a
slightly different meaning). This is how it works: the first time
x is needed, the thunk function is executed, then the value is
stored in the x variable, overwriting the thunk and the value is
returned. The second time x is evaluated, the value is used. That
means that the thunks live only for the purpose of one use, later they
are not needed anymore.

The Environment

An environment is a data structure that provides with variable
entities when it's given a name. This is what makes it possible to
use variables that can hold values which can be modified.

So an environment has three basic operations:

  • declare a variable
  • assign a value to a variable
  • get a value of a variable

In order to allow variables to be used in blocks, there are two
additional operations:

  • enter a scope
  • leave a scope

After entering a scope, we can declare new variables without modifying
the old ones, while the old ones are still visible. These variables
are created on the same level. When leaving a scope, all the
variables of the current level are forgotten (not necessarily destroyed).

Lexical and Dynamic Environments

PowerScheme has two scopes: lexical and dynamic. In order to have
both I use two environments: a lexical environment and a dynamic
environment.

They can share variables, but the lexical environment in my
implementation is used first to lookup a value.

In order to tell which one to use while defining variables, I use two
different keyword: define is used for lexically scoped variables,
and dynamic is used for dynamically scoped variables. They are used
the same way. Here are some examples:

(define a 1)
(define b 2)
(dynamic
 c 3)
(dynamic d 4)
Enter fullscreen mode Exit fullscreen mode

It's also possible to declare lexically and dynamically scoped
functions. Because functions are just values, there is no need for
restrictions here. Moreover, dynamically scoped functions can have
some interesting uses, like for example, for handling exceptions.

So here are some examples of function declarations:

(define (lex-fun b)
  (+ 10 b))
(dynamic (dyn-fun b)
  (+ 100 b))
Enter fullscreen mode Exit fullscreen mode

As you can see, I tried to replicate as closely as possible the
lexical scoping syntax for dynamic scope.

Differences between Lexical and Dynamic Scope

Now I would like to compare lexical and dynamic scope from the point
of view of the implementation.

There are two important differences between the two. When functions
are created, only the lexical environment of the creation is saved
inside the function. It's like when you write a book: throughout its
whole life it will refer to the same events as when it was written.

Here is the code snippet from the Evaluate function that implements
define:

"DEFINE" {
    if ($cdr.car.Type -eq "Symbol") {
        $name = $cdr.car.Value
        $value = Evaluate $cdr.cdr.car $env $denv $false
        $env.Declare($name, $value)
        return $null
    } else {
        # (define (<name> . <params>) <body>)
        $name = $cdr.car.car.value
        $params = $cdr.car.cdr
        $body = $cdr.cdr
        $function = (Make-Function $name $env $params $body)
        $env.Declare($name, $function)
        return $null
    }
}
Enter fullscreen mode Exit fullscreen mode

It's a part of a switch, so the first line checks whether we have
encounter define. Then, I think the code is easy to follow. For
dynamic the code is very similar. It uses the dynamic environment
for the definition and specialized functions for the environment
because declaring dynamic entities can be different from defining
lexical entities. Either way $null is returned in order to not pass
more values than necessary.

On the other hand, when the function is called, the current lexical
environment is not used at all, it's as if we wanted to isolate the
function from what happens now and let it live in a controlled way by
giving only the necessary values.

Contrary to the current lexical environment, the contents of the
dynamic environment can be used by the function, and that's where its
name comes from: it's dynamically provided to the function.

Here is the Invoke function that is used to call user-defined
functions:

function Invoke($function, $argsExp, $env, $denv, $tco) {
    $funVal = $function.value
    $params = $funVal.params
    $defEnv = $funVal.defEnv
    $denv.EnterScope()
    $argList, $dotValue = Eval-Args $argsExp $params.length $env $denv
    if (!$tco) {
        $defEnv.EnterScope()
        Extend-With-Args $argList $dotValue $function $defEnv $denv
        $result = Eval-Body $function.value.body $defEnv $denv $true
        $defEnv.LeaveScope()
    } else {
        Extend-With-Args $argList $dotValue $function $defEnv $denv
        $result = Eval-Body $function.value.body $defEnv $denv $true
    }
    $denv.LeaveScope()
    return $result
}
Enter fullscreen mode Exit fullscreen mode

The parameter $env, which represents the current lexical environment
is used only for the evaluation of the arguments. Then, instead of
it, the definition environment, that was saved during the creation of
the function, is used. The dynamic environment, $denv, on the other
hand is used as it is and passed to the Eval-Body function directly.

Tail Call Optimization

My interpreter, PowerScheme, also provides an implementation of the
tail call optimization. When a function is the last of a block of
function, it's possible to perform a simplified version of the call.

In the Invoke function above you can see that for tail calls, that
is when $tco is true, we do not enter nor leave the definition
environment. We simply evaluate the body as if it were in the
continuation of our presently interpreted code.

So this is the Eval-Body function, this is the one that decides
whether a particular evaluation is the last or not:

function Eval-Body($body, $env, $denv) {
    Define-Defines $body $env
    $cons = $body
    $result = New-Object Exp -ArgumentList ([ExpType]::Symbol), "NIL"
    while ($cons.type -eq "Cons") {
        $tco = ($cons.cdr.type -eq "Symbol")
        $result = Evaluate $cons.car $env $denv $tco
        $cons = $cons.cdr
    }
    return $result
}
Enter fullscreen mode Exit fullscreen mode

When we have the last cons, its cdr is NIL. That is what we
test. Of course, the body itself must be the last item of the
function. Actually a body is what is needed to perform tail-call
optimization because the body has its own scope. And since there is
no such thing as a program stack in the interpreter, I only need to
care about the variables. Of course PowerShell has a stack, but from
this code I cannot force it to perform tail-call optimization. For
this project tail-call optimization simply means that the values do
not get accumulated in the environment when it's not needed. So,
without having access to the stack real TCO is difficult to implement.

The purpose of Define-Defines is to make sure that all the defines
of a given lexical level are available to everyone inside the block.
I will tell more about the problems it allows to solve later.

Environment Representation

Now let's look at how the environment is represented internally inside
the program. When I wrote my first interpreter, MUSH, I decided to
implement the environment using a hash-table, where keys were variable
names and their lexical levels (the lexical levels was a way to avoid
saving the definition environment for the functions, so each variable
was assigned a lexical level in addition to the name). The values
were stacks of values. Later I found out that there were several
other ways to represent the environment: a single list, or one list
per frame, and similar. Then it seems that the idea of having
a stack per name was a rather common way of implementing the
environment. So, in contrast to the Kotlin interpreter, where I used
a list per lexical level, here I decided to use an associative array
of names to stacks of values.

Now I will describe how the basic operations on the environment are
performed.

  1. Variable declaration.

    The environment contains a variable corresponding to the current
    lexical level. A variable is declared in the current level if it is
    not declared previously or assigned if the variable name and the level
    match. In case the variable exists, but with a level less than the
    current level, the new value is pushed on top of the old one. What it
    means is that the new value is put in the array and the old is linked
    to the new variable as its next value.

  2. Variable update.

    If the variable is looked up by name and it's found it's assigned.
    Very simple.

  3. Variable lookup.

    Similar to update: just find by name and return the value. If it's
    not in the array, then the name cannot be resolved.

  4. Enter scope.

    It'd done by increasing the current lexical level. Also not really
    complicated.

  5. Leave scope.

    All the variables of the current level are deleted. They are popped
    from their stacks and if it's the last one, the name is removed from
    the array. The lexical level is decreased.

Branching Stacks

These are not the only operations that are performed on the
environment. There is also the operation called Duplicate which is
used when creating a new function. A copy of the environment is made,
not a deep copy, but only of the array. The reason for this is that
if the array is later modified (for example the parent block ends and
the level is decreased), we still have access to everything we need
for the time the function is executed.

What it means is that the stacks of the variables now can have
branches. As it is not a deep copy, all the objects of the array are
the same. And the new function and its parent can add new elements on
top of the old ones, thus branches are created.

Define-Defines

One of the implication of branching stacks is that a function is not
defined in its own environment, which is a problem for recursive and
mutually recursive functions. To fix this problem I made a function
called Define-Defines. It is used before the evaluation of the
block, in the function Eval-Body. It makes a quick scan of the
whole block and looks for defines. When it finds, it adds them to the
environment with a temporary value. This value is not null, it's
actually a structure I called a cell that contains the lexical level
of definition, the value and the pointer to the next cell. So, when
the function is created it duplicates the array. And also duplicates
the reference to the cell, without duplicating it, which is important
because once all the defines are evaluated, these cells are filled and
their new values become visible both to the function and to the block
where the function is defined.

Global and Local Arrays

This Define-Defines function shows a fundamental difference between
the top-level, the zero lexical level, and other levels. Other levels
can be easily scanned forward before real work in order to add the
defines to the environment. For the global environment, on the other
hand, it's not possible, because the current block is constantly
expanding.

Let's say I want to define mutually recursive functions on the
top-level. Let's take an example of a depth-first search of a tree,
where each node has a value assigned to it and a list of sub-nodes,
which can be empty. A node is represented by a cons, which has the
number in the car, and the list in the cdr.

So, here is my version of an implementation that returns the sum of
all the numbers in the tree:

(define (tree-sum tree)
 (let ((n (car tree))
       (sub (cdr tree)))
  (+ n (tree-sum-list sub))))

(define (tree-sum-list list)
 (if (null? list)
  0
  (+ (tree-sum (car list)) (tree-sum-list (cdr list)))))
Enter fullscreen mode Exit fullscreen mode

As you can see, it has two mutually recursive function, which is not
surprising since the data itself consists of two types: nodes and
lists. So, tree-sum corresponds to nodes, and tree-sum-list
corresponds to lists. The implementation is straightforward.

So, let's test it:

(tree-sum '(5 . ((6 . ()) (9 . ()))))
Enter fullscreen mode Exit fullscreen mode

This tree returns 20.

Another example:

(tree-sum '(1 . ((2 . ((4 . ((8 . ()) (9 . ()))) (5 . ())))
                 (3 . ((6 . ()) (7 . ()))))))
Enter fullscreen mode Exit fullscreen mode

And this one returns 45, as expected.

Now let's look a little closer at the problem of mutual recursion.
The function tree-sum duplicates the environment, which does not
contain tree-sum-list. And it will not appear there, because it was
not entered yet if we are talking about the REPL. So how to solve this
problem?

When it's something local, we can just scan ahead and look for all the
defines, but here, it's not there yet, so we can't do it. One
solution would be to store tree-sum in some kind of waiting list, so
that when tree-sum-list is defined, we can define tree-list. It
would be easy if they weren't mutually recursive functions.

So, here is my solution: we do not duplicate the global top-level
zero-lexical level environment. We just say that it's global and it's
the same for everyone. This way when tree-sum-list is defined, it
can be used by tree-sum because tree-sum didn't make a copy and
didn't try to isolate itself from the world, so it knows what's
happening and when needed can call tree-sum-list.

Declare and Update for Lexical and Dynamic variables

No there is still a few things I would like to talk about in my
overview of the implementation details of PowerScheme, namely how the
variables are updated.

One observation is that the dynamic scope cannot be updated using the
same method that is used for lexical scope. It's interesting to
notice, that reading and writing of lexical variables are somehow
symmetric: when we read the information flows towards us and when we
write, it flows from us. And it uses the same path, like a pendulum.

Dynamic variables, on the other hand are not symmetric. When we read,
we read either from above or from the equal level, but we write only
to the current level. It would be comparable to how we treat
information: we can read old or new books, but we can only write new
books.

For this reason there are two different methods for the update
functionality: Update is for the lexical environment, and
UpdateDynamic is for the dynamic environment. What ~UpdateDynamic
does, is it first check whether there is the variable we want to
update in the array, and if so either overwrites the value, if it on
the same dynamic level, or creates a new value, while linking to the
previous.

Conclusion about the implementation details

So this was a quick overview of my implementation of Scheme. It's
interesting that there are always a lot of different possibilities for
implementation and surprises, when something does not work as
expected. I think that by implementing such programs it's possible to
understand more not only about Scheme, but also about Lisp in general
and the language used for the implementation. I think I described the
main details that require a comment and not a lot of parts of the
source code and the functionality of PowerScheme remain unexplained.

What Else PowerScheme can do

In this section I will describe some interesting things that you can
do with the PowerScheme interpreter. The intended use I described in
the Features and Functionality parts. But very often you can do some
cool things with a programming language in addition to its intended
use.

A REPL

A REPL means read eval print loop, which is a program which
continuously prints the evaluated value read from the standard input.
It is useful for testing and for running small Lisp sessions
interactively.

The code for doing it is really simple:

(define (toplevel)
 (writeln (eval (read)))
 (toplevel))
(toplevel)
Enter fullscreen mode Exit fullscreen mode

And this is an example of a REPL session:

ZSH:gproj/scheme-lisp/chap02pwsh>pwsh PwScheme.ps1 tests/toplevel.scm
>: (let ((x 5) (y 3)) (+ x (* y 10))))
35
>: (define (lola x) (writeln 35) (writeln x))

>: (lola 40)
35
40

>: (exit)
EXIT invoked
ZSH:gproj/scheme-lisp/chap02pwsh>
Enter fullscreen mode Exit fullscreen mode

As you can see it behaves more or less the same way a normal lisp
should. But it's somewhat limited: it doesn't work when there are
expressions that take several lines.

Y-Combinator

Y-Combiantor is a function that when applied to another function
returns its fixed-point. That is, a function which, when applied to a
value returns itself.

This is useful to implement loops in lambda calculus: by making a
function so, that it calls its argument with the next value, it can
perform some looping calculations. Here is an example from the Lisp
in Small Pieces book (of course there are many other examples in other
books, like addition, subtraction, and so on). It calculates the
factorial.

First the Y-combinator:

(define fix
  (let ((d (lambda (w)
           (lambda (f)
            (f (lambda (x) (((w w) f) x)))))))
  (d d)))
Enter fullscreen mode Exit fullscreen mode

And here is the modified factorial to be used with the fix function:

(define (meta-fact f)
 (lambda (n)
  (if (= n 0)
   1
   (* n (f (- n 1))))))
Enter fullscreen mode Exit fullscreen mode

As you can see, it applies its argument with the next value. Its
argument is the fixed point of meta-fact, which means, it produces
the same call again with a lower value.

Here is how it's invoked:

((fix meta-fact) 3))
Enter fullscreen mode Exit fullscreen mode

The important thing here is that (fix meta-fact) is equivalent to
f in meta-fact.

By definition of fix, (fix f) = (f (fix f)).

So, ((fix meta-fact) 3)

is equal to ((meta-fact (fix meta-fact)) 3),

which in turn is equal to (* 3 ((fix meta-fact) 2)) if you insert
(3) into the definition of meta-fact.

And it continues this way until it returns the value of the factorial.

Function Objects

As I described before, my implementation uses lexical scope. And here
I would like to illustrate one of its most powerful uses, which, I
would say, makes the full use of the lexical scope and the lexical
environment.

In fact, in many cases, it would be possible to replace closures with
global functions, or allow to only pass constant values to closures,
like Java does. So in order to illustrate the full power of the
lexical environment I give an example of function objects, that is,
functions which have an environment associated with them.

(define (counter n)
 (let ((c 0))
  (lambda (cmd)
   (case cmd
     ((inc) (set! c (+ c n)))
     ((dec) (set! c (- c n)))
     ((pr) (writeln c))))))
Enter fullscreen mode Exit fullscreen mode

What this function does, is that it returns a function that accepts
commands. It initializes a variable c to (0), and saves the
parameter n, which represents by how much we want to modify c at a
time.

Commands are inc to increment c by n, dec to decrement c by
n, and pr to print the current value.

This is why I find the name function objects well-suited for these
functions: they contain fields (c and d in this case), and
methods (inc, dec, and pr here).

Here is an example of how it's used:

(define cnt (counter 3))
(cnt 'pr)
Enter fullscreen mode Exit fullscreen mode

We declare a new counter with the step 3 (cnt 'pr) returns (0).

Let's try to increment it:

(cnt 'inc)
(cnt 'pr)
Enter fullscreen mode Exit fullscreen mode

Now it prints (3). Let's increment it again, but this time twice!

(cnt 'inc)
(cnt 'inc)
(cnt 'pr)
Enter fullscreen mode Exit fullscreen mode

As expected, it returns (9), which is (3 + 2 * 3). Let's check how
decrementing works:

(cnt 'dec)
(cnt 'pr)
Enter fullscreen mode Exit fullscreen mode

It prints (6). Which means, the values are correctly saved.

And decrementing twice brings us back to (0):

(cnt 'dec)
(cnt 'dec)
(cnt 'pr)
Enter fullscreen mode Exit fullscreen mode

What is missing?

Of course there is no way a Lisp interpreter will have all the
features. There should be made some choices. In this part I will
describe some things that are missing from my implementation of Lisp.
There were several reasons for not implementing a feature. Perhaps it
was too complicated, in other cases it was not really useful, or I
didn't have time to implement them. So here I will say a few words
about some, I thought it was worth to talk about, among the
unimplemented features.

Vectors

So let's begin with the vectors. In fact they are not really needed
for a minimalist implementation of Scheme. Because everything they
can do can be done with pairs and conses. For example, reading an
element, setting an element, appending an element. In many cases
lists have advantages because they are easier to use and easier to
implement. The way vectors work is often a little bit obscure and
often feels like a way to get out of Lisp and to communicate with
lower entities of the Operating System. I think the main reason Lisp
implementations support vectors is because vectors are faster. Lists
are linked lists in most implementation (perhaps it's possible to make
it differently, but I don't think it's common). And vectors are
supposed to use an array behind the scenes. For this reason can be
they much faster if you want to access elements randomly, and they
take less memory space because they don't need pointers to the next
(and sometimes previous) element. So it was possible to implement
them, but it was not in my priorities. For me it would have been much
more interesting to implement macros or continuations.

Let*

Another feature that I did not implement is the let*. It is also
useful in many cases, but a lot of people don't like it. They say
that if you use them, then it becomes more like imperative programming
and so on. I don't really know if I can agree with that. Perhaps the
fact that it imposes an order of evaluation is bad? But for me it's
more like a shortcut of nested lets, where the inner level is
dependent on the outer level. So I think many functional programming
languages allow such constructs and it is not considered bad. So,
anyway I didn't implement it because it's easy to replace them with
nested let's and their being not there doesn't really feel really
worrying.

Caar and so on

There are a lot of these functions. Really a lot. So many that in my
Kotlin interpreter I support any length for these expressions. Yes,
you can type something like caaaaaaar and it will still be
interpreted correctly. But in this interpreter it was not so easy to
implement this feature without making the code needlessly complicated.
So I decided not to do it. Only car and cdr, and that's it. If
you want more you just take a car of cdr of a cdr of cdr and
so on. It should be enough for most cases. Either way they are a bit
confusing and not easy to read, and you should not abuse of them when
writing code. So a function composition will not hurt from time to
time.

Set-car! and Set-cdr!

Yes, these also are not there. It was perhaps not the right thing to
do, but in actuality, it is known that immutability is a good thing in
functional programming languages. Why would you need something
mutable, when you can have the same thing but immutable? So, in the
examples of the code I used for tests, it was not needed at all. It's
surprising how much you can do without needing to use these or set!
and other similar functions. But in my implementation, set! is
implemented. And even this is almost never used, so modifying the
car or the cdr of an existing cons feels like it's not the right
thing to do. Is it really needed? Can it be avoided? Let's just do
it another way… And so on. The conses have been born the way they
were born and to mutate them in something else just feels wrong. This
is why it easier to tolerate set! because it modifies the whole
value, it replaces one value by another, one can look at it as a
permanent shadowing.

Continuations

I really wanted to implement continuations. I have spent a lot of
time thinking about what they mean and about ways to implement them.
People say that they are useless and dangerous to use
against_callcc, but in my opinion it's a very interesting
concept. It's an alternative way to look at the execution flow. And
the execution flow is a very important concept to know and to
understand, espessially for poeple interested in programming languages
and compilers. Perhaps continuations seem complicated because the way
the execution flow works is not well understood. Matt Might calls
them the least undestood of all control flows
continuations_by_example. It's not just an abstraction that
allows to implement other control flow constructs. It's an important
theoretical concept which reflects an interesting point of view on the
problems related to the control flow.

In fact continuations allow to implement many existing control flow
mechanisms. Like for example, goto friedman_applications_1988,
if, loops, exceptions, coroutines reynolds_definitional_1998
and so on. But these constructs were not defined with continuations
when they first appeared. It was later shown that with continuations,
it's possible to implement them. So my idea is this. If we
understand the continuations well enough, perhaps we can come up with
other control flow mechanisms which were not described yet. And this
idea seems interesting. Because we don't have a lot of this
mechanisms, and they are all different and not very organized. So, if
at least we learn how to categorize them and to find out which kind of
control flow is needed in which situation, we can define other control
flow mechanisms that can suit better our current problem.

So what are continuations? If you read some info about it you will
find out that the first steps to understand them are rather difficult.
Everybody describes them differently. People seem to talk about
different things. Some see them as jumps, and describe them from the
assembly point of view reynolds_discoveries_1993. Other people
see them as a way to save a point inside a computation, and yet other
say that it's a way to pause the program and to continue later.
George Springer, for example calls them the rest of the
computation
. springer_scheme So to find what is common about all
these concepts is not that easy. And also, the examples shown usually
have 3 or 4 nested lambda functions, which in itself is not easy to
understand. So now I will give you a quick description of what the
continuations are. In reality I hope to implement them for the next
project that I will do for the third chapter of Lisp in Small Pieces.
And then I will describe them as thoroughly as I can. That's why I
will not say a lot this time.

Continuations are a point in execution. If you have a virtual machine
that's running and you pause it, and then duplicate it, that's not
what continuations are. Continuations only concern themselves with
the execution, not with data. So it's like you go back in time, but
it's only you, the whole world continues how it has always been. In
other words, continuations are similar to the address of a label. A
good example of continuations are exceptions and longjmp
wikic2_call. You don't really go back in time. The data that
has changed will not revert to its previous state. But you go back to
the position you were before. Continuations are in fact more general
than that, but these are good examples of how continuations work.

And then I wanted to find a way to include them in the interpreter.
And I thought about how I would do it. My reasoning was like this: I
can save a cons that has to be executed next. Is it a good way to
represent a continuation? The answer is no. I can perhaps finish the
execution of one block, but what do I do next? The block doesn't tell
me what to do next. What does it have to tell me? It has to inform
me where it came from and what it wanted to do when it finishes. But
why is it the block that has to have this information? Because a
continuation can be taken at any point of the program and it's
impossible to know everything about everybody. So it's the item that
is executing, it has to know what has to be done next. But how can it
tell? By itself, it doesn't know anything. It's its caller that
called him, it will decide what to do after the current block returns.
As you can see it becomes complicated. In fact it's not the caller
who called the callee who decides what to do next. It's the callee
that has to remember what was the next step and tell it to the caller
so that it executes what it needs to execute. As you can see, it's
the complete opposite of how everything was written up until now. So,
let's try to think one more time what it would look like if I added
continuations to my implementation. The caller knows what has to be
done when the callee returns, so it tells the callee what has to be
done next. Then the caller can forget about everything it had told
the callee because it's up to the callee to remember. When the callee
returns, it says hello to the caller, and says, you wanted to do
this when I return. Then the caller executes it. From this example
you can see that the one that executes what happens after, when the
callee returns, does not have to be the original caller. It can be a
simple loop that reads the next action and executes it. In this
situation, it's possible to actually save the continuation and store
it in the computation. And when the user needs, he can execute it.
So for these reasons I didn't implement them in my interpreter. It
would have made it too complex if I wanted to keep everything as it
is. So perhaps there are better strategies for the implementation of
continuations.

What I described here is similar to what Hayo Thielecke call
continuation passing style thielecke_continuations. The next
thing to execute is systematically passed to the callee. And this
way, in order to store a continuation, what needs to do is to copy the
stack. Because these next points are stored as arguments to each
function, and each function has the next thing to do passed by its
caller.

Exceptions

Having said a few words about continuations I will now say why I
didn't implement the exceptions. Exceptions are actually easier to
implement than continuations. The purpose is to execute a handler and
to fall through the stack until the level where the handler was set.
It can be implemented with continuations, by saving the the point
where the exception handler was set, but also can be implemented by
returning a special value. Then the caller checks and sees the value
and returns it to its caller, until the specialized function that
handles exceptions. As you can see, this implementation also needs a
lot of modifications to the code.

Macros

In Common Lisp one of the most interesting and complicated features
are macros. They allow to make some really unusual, from the
perspective of other programming languages, things. And I don't have
a lot of experience with macros. I know more or less how they work:
they take arguments without evaluating them, call the body that makes
something with the arguments, then the body has to generate an
expression that is then evaluated. Sounds easy. But I don't know
enough examples in order to test that they work correctly. There is
also another problem: I am writing a Scheme interpreter, not a Common
Lisp interpreter, so that means that to test them, it would be even
more complicated, because I know even less about macros in Scheme.
And this problem, in contrast to continuations, I will not try to
solve right now. I have seen that other chapters in Lisp in Small
Pieces deal with macros (mainly the chapter 9).

Conclusion

So, for the conclusion of this post I will say a few words about what
this project gave me, what it allowed me to learn and where I can go
from here as a result of having finished it.

I used PowerShell for the first time. If not for this project, it's
hard to imagine, what could motivate me to learn this language. Not
because I don't find it interesting and I don't like learning new
things. It's more because I find it difficult to find ideas for new
projects, especially if it's a shell language like PowerShell, which
feature-wise doesn't seem to be too revolutionary (it's more or less
the reason I still haven't learned fish) . I know how PowerShell is
often described as the first language to use pipes with objects, but I
find this to be not much more than a little extra, like syntactic
sugar, or an interesting toy. Even though I theoretically understand
how revolutionary it might be for certain tasks, I don't use the pipe
on Linux that much in order to miss the object-oriented functionality,
furthermore, it's not something that played a vital part in my current
project.

So, I have written my third interpreter, and my second Scheme
interpreter. My first interpreter was MUSH, a mash-up between CCP and
PL/M. I really like the result. I think I will re-implement it one
more time with more features. My second interpreter was Scheme, for
which I used Kotlin. So this is my third one. And I actually start
to see a pattern. I think I start to understand some main structural
components of the interpreters and how they should be written. Of
course I'm still a beginner, and I still have several interpreters and
compilers to write in order to feel comfortable in this domain.

I have managed to write an interpreter that from my point of view can
do some rather interesting things: it has letrec, which can do
mutual recursion. And for some reason my letrec test doesn't work
on Racket, nor on Guile. Perhaps I have implemented something which
was not intended to implemented in a normal Scheme…

Mutual recursion can also be done using define statements on the
top-level or on a local level.

The interpreter has also higher-order functions, which I still
consider something rather mysterious and risky, at least when trying
to execute them in the interpreter.

Also there is support for function objects. That is I can create a
counter function, and, by passing different messages to the function I
can manipulate the counter: increase its value, print its value and so
on. It makes the function object somehow similar to a living
organism, it's a bit weird, not easy to describe.

Another example of things that my interpreter can do, is the
y-combinator. This is also something that you can find among the
examples in the directory with the source code of my interpreter.
This is also something that, every time I see it working, makes me
wonder how it can possibly work. It still surprises me when I see it
print the right answer.

There are some interesting concepts this project allowed me to learn.
For example, this is my first attempt at making an interpreter with
tail-call optimization. Now I see that it has two parts: the data and
the control flow. They are not necessarily connected.

It's interesting to inspect the differences between the lexical scope
and the dynamic scope: how the lexical environment is passed when a
function is created, but the dynamic environment is passed when the
function is called. A good way to understand these concepts is to
compare them. And there is no better way to make a comparison than
including both in the same project! It is necessary to recognize,
that the dynamic scope is not something that is buggy and should be
avoided. Both lexical scope and dynamic scope make sense and both are
complex concepts. What is dangerous is to approach one with the
mindset of the other. Then we can easily get lost. But by itself,
dynamic scope is very logical, useful and, in itself, is totally
coherent. Richard Stallman says that it allows to do things that
otherwise would be impossible to do in a practical way
stallman_emacs:_1981.

Something I didn't initially plan to do is the part with the exception
handling. By reading more about the way exception handling is related
to dynamic scoping, it occurred to me that it wouldn't be so bad if I
implement it. But as I don't have continuations and long jumps, the
only thing I could do is to crash the program, that's why the
exception handlers in my interpreter are called crashers.

Having implemented all these things I realize that there are still a
lot of things missing and a lot of things that can to be improved.
It's an interesting journey, and every time I finish such a project
more interesting doors open for improvements.

Of course, what is obvious, is that the type system is very limited.
Not only is the vector missing, examples of use for which are
extremely easy to find, but also, floating point numbers, and
fractions would be useful, as well as the support for very big
numbers, which are supported by some languages.

Some features could also be included, which are important in the
tradition of Lisp: continuations and macros. The third chapter of the
book Lisp in Small Pieces, which I will read next deals with
continuations, so I intend including them in the implementation for
the next interpreter, which I decided to do in Swift.

Another interesting thing to do, is to make a Scheme compiler. It's
also in the tradition of Lisp to combine compiled code with
interpreted code. Lisp was at first a fully interpreted language, but
very early it had compilers. So compiled code and interpreted code
were side by side throughout the whole history of Lisp, in many of its
forms: MacLisp, Common Lisp, Racket, Emacs Lisp, and so on. So it
seems it's impossible to understand and appreciate Lisp without having
done both, an interpreter and a compiler.

So, this is the end of this post about the PowerScheme interpreter, I
hope you enjoyed reading it, see you in the next post.

Bibliography

[queinnec_lisp] Queinnec, Lisp in Small Pieces, .

[given_re:_2009] @miscgiven_re:_2009,
title = Re: Coroutines and Go,
url = http://lua-users.org/lists/lua-l/2009-11/msg00576.html,
urldate = 2019-03-24,
author = Given, David,
month = nov,
year = 2009,
keywords = compilers,

[graham_roots_2002] @miscgraham_roots_2002,
title = The Roots of Lisp,
author = Graham, Paul,
month = jan,
year = 2002,

[jones_learn_2017] Jones & Hicks, Learn Windows PowerShell in a month of lunches, Manning (2017).

[the_devops_collective_monad_2016] The DevOps Collective, The Monad Manifesto, Annotated, Leanpub (2016).

[wikic2_homoiconic] @miscwikic2_homoiconic,
title = Homoiconic Languages,
url = http://wiki.c2.com/?HomoiconicLanguages,
urldate = 2019-03-09,
keywords = lisp,

[tanter_beyond] Tanter, Beyond static and dynamic scope, , 11 .

[payette_windows_2007] Payette, Windows PowerShell in action, Manning ; Pearson Education distributor.

[against_callcc] @miscagainst_callcc,
title = An argument against call/cc,
url = http://okmij.org/ftp/continuations/against-callcc.html,
urldate = 2019-03-06,
keywords = continuations, scheme,

[continuations_by_example] @misccontinuations_by_example,
title = Coroutines, exceptions, time-traveling search, generators and threads: Continuations by example,
author = Might, Matt,
url = http://matt.might.net/articles/programming-with-continuations-exceptions-backtracking-search-threads-generators-coroutines/,
urldate = 2019-03-07,
keywords = continuations, scheme,

[friedman_applications_1988] Friedman, Applications of continuations, in edited by (1988)

[reynolds_definitional_1998] Reynolds, Definitional interpreters for higher-order programming languages, in edited by (1998)

[reynolds_discoveries_1993] Reynolds, The discoveries of continuations, LISP and Symbolic Computation, 6(3-4), 233-247 (1993). link. doi.

[springer_scheme] Springer & Friedman, Scheme and the Art of Programming, .

[wikic2_call] @miscwikic2_call,
title = Call With Current Continuation,
url = http://wiki.c2.com/?CallWithCurrentContinuation,
urldate = 2019-03-01,
keywords = continuations, scheme,

[thielecke_continuations] @miscthielecke_continuations,
title = Continuations, functions and jumps,
shorttitle = Logiccolumn8.pdf,
url = http://www.cs.bham.ac.uk/~hxt/research/Logiccolumn8.pdf,
urldate = 2019-03-04,
author = Thielecke, Hayo,
keywords = continuations,

[stallman_emacs:_1981] Stallman, EMACS: The Extensible, Customizable Display Editor, in edited by (1981)

Top comments (0)