DEV Community

Cover image for Semantic Analysis (Pogo Pt:9)
Chig Beef
Chig Beef

Posted on

Semantic Analysis (Pogo Pt:9)

Intro

In this series I am creating a transpiler from Python to Golang called Pogo. In the last post we finally made Go from Python, but if we wanted we could write whatever the heck we wanted and it would get translated, even if it wasn't valid.

Before we Fix the Code

I want to run some tests that I forgot to add into the last post to show why we are even compiling Python to Go. We are going to test the performance difference between the 2.
Here is the Python code we are going to test.

from GoType import *
for i in range(2, 100_000):
    prime: bool = True
    for j in range(2, i):
        if i%j == 0:
            prime = False
    if prime == True:
        print(i)
Enter fullscreen mode Exit fullscreen mode

Here we are printing out every prime number that is lower than a million. Running this through our beautiful compiler gives us this code as output.

package main

func main() {

    for i := 2; i < 100_000; i++ {
        var prime bool = true

        for j := 2; j < i; j++ {
            if i%j == 0 {
                prime = false
            }
        }

        if prime == true {
            println(i)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Look at that! Nicely formatted thanks to gofmt, and logically the same. Now remember, we aren't trying to run the most efficient way to find prime numbers, both languages have the handicap of using an inefficient technique, but which one is faster (answer should be obvious)?

The Results

I ran the command python3 test.py (after adding timing) and let it run... and it kept going, printing out every prime number along the way. I knew this would take a long time so in the meantime I can focus on something else. That's the great thing about Python, it's so slow it forces you to have a break. Now keep in mind also my laptop isn't exactly "cutting edge."
276.3s
That's over 4 minutes.
Now lets try Go to see if we made any improvement.
10.05s
That's over a 27x time improvement! That's such a massive difference it makes you question why you would ever write Python.
(Side note, Go took 891.246 seconds to calculate up to a million).

Semantic Analysis

Now that we've had fun writing a crude performance test we can continue to work on our compiler. Firstly, we need to actually make an analyzer with an analyze method attached.

type Analyzer struct {
}

func (a *Analyzer) analyze(s Structure) error {
    return nil
}
Enter fullscreen mode Exit fullscreen mode

The analyzer will take in a Structure, and returns an error.

Undeclared Variables

In Go we can't start changing variables if the variable doesn't exist, and in Python the same is true but its syntax makes it look the same as a manipulation. We need to write code to check for this. Firstly, variables need to stay within their scope, which stops variables from one function being used in another (I know we don't have functions yet but might as well implement this feature for every other nested structure).
To start coding this feature, an input parameter of analyze will be a slice of variables that are available in this scope.
For now all a variable needs to hold is its name and its type.

type Variable struct {
    name    string
    varType string
}
Enter fullscreen mode Exit fullscreen mode

We have to use varType instead of type because type is a keyword. We can then pass a slice of these Variables as a parameter.
There is only one scenario where we should be appending to the slice, and that is when we come across a declaration. We can easily check whether we are in a declaration by checking for the structure code attached to s. However, if we check whether s is a declaration and limit the scope of the variable to s, then that doesn't make sense, as the variable can only be used within the declaration. We don't want that, we want the variable to be available outside the declaration, but only by one level.
In this way, we have to check s's children for declarations.

for i := 0; i < len(s.children); i++ {
    if s.children[i].code == structureCode["ST_DECLARATION"] {
    }
}
Enter fullscreen mode Exit fullscreen mode

If we come across a declaration, then we need to add the variables name and type to a new Variable and save it to our slice of Variables.

n := s.children[i].children[0] // IDENTIFIER - name
t := s.children[i].children[2] // IDENTIFIER - type
v := Variable{n.text, t.text}
vars = append(vars, v)
Enter fullscreen mode Exit fullscreen mode

Perfect! Now that we have this for loop over all children, we may as well also use this to analyze each of s's children.

err := a.analyze(s.children[i], vars)
if err != nil {
    return err
}
Enter fullscreen mode Exit fullscreen mode

When we put this code above the code just before, we check that variables used exist before adding variables to vars. In this way, we can't use the variable we are declaring as part of the declaration, for instance, x: int = x isn't valid, just as it shouldn't be. This doesn't mean that x = x + 5 isn't valid however.
Now we need to start prohibiting the use of undeclared variables. First, lets check manipulations for undeclared variables.

if s.code == structureCode["ST_MANIPULATION"] {
    name := s.children[0].text
    var valid bool
    for i := 0; i < len(vars); i++ {
        valid = false
        if name == vars[i].name {
            valid = true
            break
        }
    }
    if !valid {
        return errors.New("[Analyzer (analyze-manipulation)] An attempt to manipulate an uninitialized variable was made")
    }
}
Enter fullscreen mode Exit fullscreen mode

In this we only check the first variable, which is the variable we are assigning to. We will check the other variables when checking expressions. We first get the name of the variable, then we look through all the variables that we have, and if we don't find it, then we throw an error.
Trying to run the following code throws an error.

from GoType import *
x: int = 5
y = 7
Enter fullscreen mode Exit fullscreen mode

However, if we let x = 7 instead it runs just fine. Now we need to work on expressions, because the compiler currently allows for x = z, which doesn't exist.

else if s.code == structureCode["EXPRESSION"] {
    for i := 0; i < len(s.children); i += 2 {
        name := s.children[i].text
        var valid bool
        for j := 0; j < len(vars); j++ {
            valid = false
            if name == vars[j].name {
                valid = true
                break
            }
        }
        if !valid {
            return errors.New("[Analyzer (analyze-expression)] An uninitialized variable was used in an expression")
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

This code is quite similar to the code beforehand, but you may notice an outer loop. This loop increments by 2 instead of one. This is done so that we only check literal children instead of operators.
We add this check at the start of the loop, because we don't want to check whether 7 has been declared.

if s.children[i].code != structureCode["IDENTIFIER"] {
    continue
}
Enter fullscreen mode Exit fullscreen mode

Now we may as well do the same thing for comparisons, which is just a copy paste so I won't add the code here.
There is one more place that we are able to use undeclared variables, and that is in print statements.

else if s.code == structureCode["ST_CALL"] {
    name := s.children[2].text
    var valid bool
    for i := 0; i < len(vars); i++ {
        valid = false
        if name == vars[i].name {
            valid = true
            break
        }
    }
    if !valid {
        return errors.New("[Analyzer (analyze-call)] An uninitialized variable was used in a function call")
    }
}
Enter fullscreen mode Exit fullscreen mode

In future, function calls may take many parameters, but for now we will only be checking one since that's all we use for print. We also add the extra check like before as to whether the variable is an identifier or not.

That's all for our semantic analyzer right now, as we keep adding more features, we may need to come back and update it so that it can keep us from writing bugs into our Go.

Next

I don't actually know what I will be adding to the compiler next. But there are some options so I'll list them out.

  1. Starting the optimizer, most likely adding constant folding
  2. Adding classes
  3. Adding functions
  4. Working around with types more.

Thanks!

Most of the reason I don't know what I'm doing in the next post is because I didn't expect to be needing to really write these posts. That was until I saw that people were not only reading these posts, but adding them to their reading list. It was great to see that people are keeping along with the project, and are hopefully learning about compilers.

Top comments (2)

Collapse
 
joshw92 profile image
Joshua

Nice post about semantic analysis in Pogo! How are you planning to handle error handling in future updates?

Collapse
 
chigbeef_77 profile image
Chig Beef

That's a good point, I should probably deal with that because it would make the compiler so much better. I think I've already made tomorrow's devlog but I'm thinking I'll add line numbers to the errors? And also make it easier to find where in the compiler the error is coming from (because we want errors that are useful for both me (As the compiler designer) and the person using the compiler. Thank you for helping me with this project (: