DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Basile B.
Basile B.

Posted on • Updated on

Comments on styx operator overloading

Introduction

Since serveral years I work on a hobby programming language called styx, one important feature to add was the ability to overload operators.

For example a compiler does not know how to compare two unrelated values (e.g struct), and a naive auto-generated comparison of two related values goes from bad (e.g compare all the members) to very bad (e.g bitwise comparison). The bad method may handle fields that are not representative of the instance identity and the very bad one may return wrong results because of pointers and fat pointers, skipping comparison of what the member points to.

Being able to overload the comparisons solves that (assuming you want to use operators instead of calls of course)

I will comment and explain the path adopted by styx.

Examples from other languages

Let's take a look at the syntax of several others system of operator overloading. I'll take systems that are based, like Styx, on member functions, and that allow to overload calls.

struct S // D
{
    auto opCall(){}
    //   ^-----special identifier used to recognize the overload
}
Enter fullscreen mode Exit fullscreen mode
struct S // c++
{
    void operator()() const {}
    ///          ^-----special toks used to recognize the overload
}; 
Enter fullscreen mode Exit fullscreen mode

We can recognize two styles.

The D style is based on predefined identifiers. That's not a 100% user friendly solution because you have to remember the name of the method, but that's a clear compiler friendly system because it is fast to find if something is overloaded. A single lookup in a symbol table and then the overload can be tried.

The C++ style is user friendly, the () stands for a call, it's easy to remember. However the compiler-side of that system is not good. In particular I'd criticize the fact that a specific grammar rule is required to parse operator overloads.

The styx way

Styx operator overloading syntax does not require a new language construct and is not based on special identifiers. It uses a parametric function attribute.

Styx attributes are of the form

Attribute ::= "@" Identifier ("("Expression ("," Expression)*")")?
Enter fullscreen mode Exit fullscreen mode

For example to indicate to the compiler whether or not it has to inline a function

@inline(true) function inlineMePlz();
@inline(false) function dontInlineMePlz();
Enter fullscreen mode Exit fullscreen mode

Operators re-use that syntax but contrary to C++, a full expression, no just symbols, defines what the function is supposed to overload and that does not require special cases in the parser.

struct S
{
    @operator(a(b)) function overloadCall();
}
Enter fullscreen mode Exit fullscreen mode

This works fine because the compiler uses named values, the operators, to distinguish the type of a concrete expression. For example this heavily stripped version of the CallExpression shows that its operator is set to leftParen. (note that setOperator() is called by an inherited constructor)

@final class CallExpression : UnaryExpression
{
    @override function setOperator() { operator = leftParen; }   
}
Enter fullscreen mode Exit fullscreen mode

The semantics are checked using the visitor patten. The visitor does not dispatch using dynamic casts but using the nodes operator. (to explain that quickly this is because in a compiler we often only care about the most derived type of a node but dynamic casts search to match the identity in the whole inheritance list, which is superfluous)

@inline function visitConcreteExpression(Expression* e)
{
    var Expression** ae = &e;
    switch e.operator do
    {
    on equal .. lesserEqual do visit( *(ae:CmpExpression**) );
    on plus         do visit( *(ae:AddExpression**)         );
    // etc...
    on leftParen    do visit( *(ae:CallExpression**)        );
    // etc....
    }
}

Enter fullscreen mode Exit fullscreen mode

And when a CallExpression is visited, it is just required to check if the aggregate instance used as this parameter contains a function attached to an @operator attribute that has for argument an expression with the leftParen operator.

Before checking the default semantics of an expression the following routine is called. (it is voluntary fully copied to show that processing styx operator overloads is very generic).

/** Returns: `null` if operator overload is not possible and a CallExpression otherwise.
 * If the result is valid, its semantic is not yet run. */
function tryOperatorOverload(Expression* e; Expression*[] args): Expression*
{
    assert(args.length);
    assert(args[0].type, "call tryOperatorOverload after una.exp or bina.left sema");

    // only allowed on custom types or on pointer to custom types (DotExp implicit deref)
    var auto tp = args[0].type.asTypePointer();
    var auto ta = tp ? tp.modified.asTypeAggregate() else args[0].type.asTypeAggregate();
    if !ta do return null;

    // be sure that the aggregate `hasOpOvers` is defined
    if ta.declaration.progress != SemanticProgress.done do
        assert(0, "cannot try opover if aggregate declsema is not run");

    var auto ad  = ta.declaration;
    var auto ads = ad.symbol;

    if !ad.hasOpOvers do
        return null;

    // handle non-overridden opovers
    var Type*[+] chain;
    getInheritanceChain(ta, chain);
    chain ~= ta;

    foreach var auto t in chain.reverse do
    {
        var auto ts = asTypeAggregate(t).declaration.symbol;
        foreach var auto s in ts.children do
        {
            // on every member funcs and overloads ...
            var auto fd = symToFuncDecl(s);
            var auto od = symToOverDecl(s);
            var auto funOrOverDecl = fd ? od;
            if !funOrOverDecl do
                continue;

            // ... that are annotated @operator....
            var Attribute* oa;
            if (oa = funOrOverDecl.getAtOperator()) == null do
                continue;

            // dont check param count if func is an overload set
            var auto parametersMatch = true;
            if fd do
            {
                if fd.isStatic() do
                    parametersMatch = fd.parameters.length + 1 == args.length;
                else do
                    parametersMatch = fd.parameters.length == args.length;
            }

            foreach var auto p in oa.arguments do
            {
                var bool isMacroOp;
                if var auto oom = asOpOverloadMacro(p) do
                    isMacroOp = getMacroOperatorKind(e) == oom.kind;

                //... that handles the right operator...
                if (p.operator == e.operator || isMacroOp) && parametersMatch do
                {
                    var auto id = (new IdentExpression).create(args[0].startPos, funOrOverDecl.name,
                                   fd ? fd.returnType else od.asTypeDeclared, funOrOverDecl.symbol);

                    var auto de = (new DotExpression).create(args[0].startPos, args[0], id);
                    de.symbol   = funOrOverDecl.symbol;
                    de.type     = fd ? fd.returnType else od.asTypeDeclared;

                    var CallExpression* ce  = (new CallExpression).create(args[0].startPos, de);
                    ce.arguments            = args[1..$];

                    // ... rewrite as a CallExp
                    ce.isOpOverRewrite = true;
                    return ce;
                }
            }
        }
    }
    return null;
}  
Enter fullscreen mode Exit fullscreen mode

If the routine returns a non-null value, operator overloading has worked and default semantics are skipped.

Again, using the expression operators makes the processing of operator overloads easy and very generic, but not fully...

The dark side of styx operator overloads

Unfortunately, and in case you would not be aware, a compiler has often to deal with special or corner cases.

Compound expressions

While using a dummy expression to recognize what needs to be overloaded works generally fine, there is the problem of the expression compounds.

Let's take the example of assigning to an IndexExpression

a[0] = 1;
Enter fullscreen mode Exit fullscreen mode

Let's define a structure that overloads the two expressions used in the example

struct A
{
    @operator(a=b) assign(u64 value){}
    @operator(a[b]) getElem(u64 index): u64 {return 0;}
}
Enter fullscreen mode Exit fullscreen mode

Finally let's rewrite manually the example

var A a;
a.getElem(0).assign(1);
Enter fullscreen mode Exit fullscreen mode

This does not work, the correct this is lost and if by error getElem() would return an A it would be a rvalue, i.e not suitable for an assignment.

So in the almost perfect world of the styx operator overloads there are cases which require special processing.

That processing consists into looking at the sub expressions nested in an expression to recognize and define a "macro operator" that replaces the standard operator. The working version of A is finally

struct A
{
    @operator(a[b]=c) assignToElem(u64 index; u64 value);
}
Enter fullscreen mode Exit fullscreen mode

The singular case of the BoolExpression

Styx allows to overload the BoolExpression (e.g true, false). The support has been added because styx lets the user write shortened conditions such as

var s8[] array;
if array do {} // lowered to `if array.length != 0 do ...`
var s8* ptr;
if ptr do {} // lowered to `if ptr != null do ...`
Enter fullscreen mode Exit fullscreen mode

Overloaded BoolExpressions are used when shortened conditions are checked. For an aggregate this takes the following form

struct S
{
    @operator(true) function toCondition(): bool {return true;}
}
Enter fullscreen mode Exit fullscreen mode

But can be written

struct S
{
    @operator(false) function toCondition(): bool {return true;}
}
Enter fullscreen mode Exit fullscreen mode

since what is determining is the expression operator and not its value.

But if you digest the fact that this is confusing you can for example use aggregate instances as condition in a ConditionalExpression, for example

struct S
{
    @operator(false) function toCondition(): bool {return true;}
}
var S s;
// same as `s.toCondition() ? 0 else 1`
var auto v = s ? 0 else 1; 
assert(s == 0);
Enter fullscreen mode Exit fullscreen mode

Finally

There's no big conclusion here, all programming languages are unperfect. Styx operator overloading works generally well, to a few exceptions. The most important being that a user friendly syntax was found and that by chance that syntax turned out to be also useful in the implementation.

See also

Top comments (0)

🌚 Browsing with dark mode makes you a better developer.

It's a scientific fact.