Mike Samuel

Posted on Mar 24, 2023

An alternative to "Distinguishing an Interpreter from a Compiler"

#compilers #discuss

This is a response to Distinguishing an Interpreter from a Compiler in which I propose a capture-the-flag approach to distinguishing between compilers and interpreters.

any language can be implemented as an interpreter or a compiler

Laurence Tratt notes as he tries to define crisply what the difference is. And he quotes Mario Wolcko's observation.

For example, suppose the source program reads a number and then has a loop whose trip count is this number. The time it takes to compile this program will be independent of the number, in contrast to the time it takes to interpret the program.

This gets at the heart of the difference between static (considering all possible runs of a program) and dynamic (focused on one run of a program with known inputs) analysis.

Tratt defines compilation in terms of partially evaluating program elements; trading off work before runtime for work done at runtime. And this ahead of time work can be amortized over many program runs or a single run on a large input. This he boils this down to some big-O notation tests; crisp and mathematical.

I dislike tying compilation to performance though; I'm skeptical of focusing on one goal, because compilers, imho, can improve programs along multiple dimensions.

So I'm going to try to broaden this definition to account for other kinds of improvements which means I can't lean on the nice big-O related math and need a different technique.

But first, let's get out of the way things that won't help us put a toolchain on the compiled/interpreted spectrum.

Semantics should not differ between compiler and interpreter

When writing a compiler for a language that previously only had an interpreter available, one tries to preserve the semantics of programs.

So we can't rely on obvious differences in semantics. That said, many PLs have reflective affordances and underspecified behaviour that often will differ. I'll show some ways to abuse introspection and debugging to test my definition.

Performance is more an indicator of toolchain maturity, not whether compilation happened

In 2004, when Gmail launched, JavaScript engines were relatively immature. Gmail relied heavily on a JavaScript compiler to reduce the network bandwidth required to load what was then the single largest JavaScript program, and to make Gmail perform acceptably.

Today, un-pre-compiled JavaScript easily out-performs 2004's pre-compiled websites, because modern interpreters (albeit ones with just-in-time compilers) are much better than the JScript 5 engine used in IE6. Google and Mozilla invested an enormous amount of engineering in their engines.

So what does a good compiler do?

A well compiled program needs less to do the same job.

less time or
fewer system resources or
fewer instructions or
a smaller runtime or
less risk of leaking private information or
less of something else 🤷

and it does that based on considering what all possible runs of the program need and removing things they don't.

Maybe they don't need to perform that expensive operation inside a loop, and instead it can be done once outside.

Maybe they can replace that loop with a single vectorized instruction.

Maybe they don't need that function at all. It's in a library that other programs may use but not this one.

I propose that a tool-chain is compiler-y to the degree it lowers the requirements for a program to run.

A practical test for compilation

How does this awfully fuzzy definition let us distinguish between a program that is running after having been compiled, and one where the interpreter is considering instructions for the first time upon reaching them?

Capture the flag (CTF) contests are a long tradition in computer security:

an exercise in which "flags" are secretly hidden in purposefully-vulnerable programs

If we can embed flags in a program that, per the program's semantics, shouldn't be needed, and then skilled flag-capturers fail to find them, that's a strong signal that a compiler identified them as unnecessary and got to them first.

Below I work through some programs in various languages doing roughly the same for each:

I write a program that mentions a string value, "FLAG" but whose semantics do not depend on it.
I run the tool-chain for the program.
I try to find the value "FLAG" somewhere in an intermediate representation or at runtime in the program.

(Caveat: I'm not particularly good at capture the flag. I was always better at writing the challenges, but I know the basics, so please bear with me)

Capture the flag against a C compiler

#include <stdio.h>
#define PRINT_FLAG 0

int main(int argc, char** argv) {
  char* flag = "FLAG";
  if (PRINT_FLAG) {
    printf("%s\n", flag);
  }
  char *other = "OTHER";
  printf("%s\n", other);
  return 0;
}

We can see that the level of compilation, the -O flag to gcc matters.

$ gcc -O0 ctf.c
$ ./a.out
OTHER
$ strings -a ./a.out
FLAG
OTHER
$ gcc -O1 ctf.c
$ ./a.out
OTHER
$ strings -a ./a.out
OTHER

I compiled the above C program two ways and each time it had the same behaviour, but looking in the binary with strings showed that, when doing any kind of optimizations (-O1 or higher), the binary has the string "OTHER" which is needed, but does not have the string "FLAG" which is not.

By embedding a secret, and looking at the program from the outside, we can distinguish between a run of gcc -O0 that intentionally does no optimizations and one which does. (That may seem surprising. More on that below)

Capture the flag against a JavaScript engine with and without pre-compilation

We can use a debugger to get a sense of what is available. If a program is pre-compiled, the debugger simply won't have access to flags that were removed. For example, running the C progam above over GDB will simply skip over char* flag = "FLAG"; as if it never existed.

Here's our sample JS:

<script>
  function createClosure() {
      var flag = "FLAG";
      // Technically, this function closes over flag.
      return () => undefined;
  }

  var closure = createClosure();
</script>

It calls a function to create a closure that closes over var flag but which never uses it. Since this is in a <script> tag, the var closure is attached to the window object so this whole program isn't a no-op.

If I run this program through ClosureCompiler, a JS code minifier, I get

Note that the "compiled code" on the right-hand-side is similar but lacks comments and the line var flag = 'FLAG'.

Note: that this is not technically a semantics preserving transformation because it affects the semantics of calling createClosure.toString(); stringifying a JS function value dumps its source code. See the "hide source" directive proposal for more details. This requirement does not prevent a compiling JS engine from ignoring the instruction though.

If I load that same HTML uncompiled in Chrome's debugger, I can see that Chrome does not eliminate the unneeded flag variable.

The flag variable is available to the debugger which helpfully tells me that it's executing that line.

You can see after I step to the next line, the debugger shows next to the previous line that the value "FLAG" was bound to the name flag.

So V8 is not compiling out things that gcc -O1 does. V8 (Chrome's JS engine) does a lot of other just in time compilation, but not this particular one. It may be that V8 does fewer optimizations, specifically when you invoke the debugger, but that is still a choice to be less compiled when being interactively debugged. Quite possibly, V8 runs this interpreted because it is not a performance hotspot.

So our definition based on needing less also works when compilation is an optional step for a toolchain.

Capture the flag against Python

Is Python compiled? CPython spits out .pyc files for each .py fail it loads, so it seems like this is an example of compilation at load time.

import dis

def f():
    flag = 'INITIAL_VALUE'
    if False:
        flag = 'FLAG'
    print('%s' % flag)

dis.dis(f)

This program has a function that assigns a value to a local variable, then does not assign the 'FLAG' string to that variable, before printing the value stored in that variable.

The function is needed because another module could load this module and call the function so it can't be entirely eliminated.

And we can use Python's internal disassembler (module dis) to look at that function.

$ python3 ctf.py
  3           0 RESUME                   0

  4           2 LOAD_CONST               1 ('INITIAL_VALUE')
              4 STORE_FAST               0 (flag)

  5           6 NOP

  7           8 LOAD_GLOBAL              1 (NULL + print)
             20 LOAD_CONST               4 ('%s')
             22 LOAD_FAST                0 (flag)
             24 BINARY_OP                6 (%)
             28 PRECALL                  1
             32 CALL                     1
             42 POP_TOP
             44 LOAD_CONST               0 (None)
             46 RETURN_VALUE
$ python3 -m py_compile /tmp/ctf.py
$ strings /tmp/__pycache__/ctf.cpython-311.pyc
INITIAL_VALUEF
FLAGz
print)
flags
ctf.py
disr
<module>r

So I ran the program. Notice the 6 NOP instruction. That's where I would expect a LOAD CONST ... ('FLAG') if it had not compiled out the second assignment.
It looks like Python's internal compilation removes branches like if False:; python3 wins the capture the flag.

Then I asked python3 to generate a .pyc file with python -m py_compile. The strings in that file do contain FLAGz. PEP-3147 notes that with .pyc files, "the compilation phase can be bypassed." Perhaps it's the case that not all of the CPython compiler's transformations are reflected in the pyc file. I'm not sure whether that qualifies as a capture the flag win or not.

Capture the flag against Java

public class Ctf {
    public static void main(String... argv) {
        String flag = "INITIAL";
        if (false) {
            flag = "FLAG";
        }
        System.err.println(flag);
    }
}

Our code here is similar to our Python example: we initialize a variable, then because of if (false) we don't change it to "FLAG". Finally we print it, so that we need the initial value but not the one we want to capture.

If I compile and disassemble the .class file, I see that Java wins.

$ javac Ctf.java
$ javap -c -constants -p Ctf
Compiled from "Ctf.java"
public class Ctf {
  public Ctf();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String...);
    Code:
       0: ldc           #7                  // String INITIAL
       2: astore_1
       3: getstatic     #9                  // Field java/lang/System.err:Ljava/io/PrintStream;
       6: aload_1
       7: invokevirtual #15                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      10: return
}
$ strings Ctf.class
java/lang/Object
<init>
INITIAL
java/lang/System
Ljava/io/PrintStream;
java/io/PrintStream
println
(Ljava/lang/String;)V
Code
LineNumberTable
main
([Ljava/lang/String;)V
SourceFile
Ctf.java

The .class file does not contain instructions to fetch (ldc aka load constant) any second value from the class's constant pool, and strings shows that the constant pool contains "INITIAL" but not "FLAG".

The Java tool-chain wins the capture the flag, without even getting into its second just-in-time compilation phase.

Conclusions

A compiler is a needs reduction machine, a program transformation that allows a program to do the same with less; a tool-chain is compiler-y to the degree it lowers the requirements for a program to run.

This definition seems vaguer than Tratt's big-O notation based definition, but recognizes that some compilers aim to provide benefits other than runtime performance.

It may seem vague but it is testable. By understanding the goals of a compiler, we can adversarially test whether it achieves those goals.

To show that, I took one example of less, it doesn't need a string value that appears in the program source but which doesn't affect the program's semantics, and showed how to test whether something is or is not compiled by playing capture the flag against these toolchains:

gcc -O0, C in "just generate instructions no fancy stuff" mode
gcc -O1 through -O3, C with optimizations
JavaScript pre-compiled using a code minifier
JavaScript not pre-compiled
Python running in CPython3 with its load-time compilation and its cached source files
Java compiled via javac

And I outlined a number of techniques for testing whether a tool-chain compiles according to this definition:

Using the UNIX strings tool to look at tool-chain artifacts, like the a.out executable built by gcc
Using a debugger to see if instructions that initialize an unneeded variable are executed
Using disassemblers like javap and Python's in-program dis module to look for instructions and static value storage

The results above largely mirror programmer's understandings of their toolchains.

Java is compiled ahead of time, just not to a platform-specific instruction set.
Python compiles at module load time in a similar way.
JavaScript can be pre-compiled to simpler JavaScript if you so wish. Modern JavaScript engines do a lot of binary instruction generation internally, but it's not a tool-chain centred around compilation at the end of the day.

There are some surprises. gcc -O0 fails the capture-the-flag, so according to this definition is not compiled. This is a flaw. In this case gcc removes the need for a large runtime, so still falls into the "does the same with less" paradigm but this CTF exercise does not capture that.

DEV Community