DEV Community

amu
amu

Posted on

Making Custom Abstraction Units for Programming Languages

Consider the concepts of composition, inheritance, encapsulation, etc, and how languages take in these concepts and create language units that utilize them. The meaning of keywords "class", "structs", and "interfaces" is different everywhere. Each language adds its own flares.

I was wondering why the devs of each language decide what they decide regarding language's fundamental units. Then I was thinking if I were to make my own units, how would I go about it. How would I even evaluate if a language unit that I designed was good or bad? Eventually, as a natural jump, I made a programming language to figure it out in action... just kidding... I just designed some units for an imaginary language.

That's what I want to talk about in this article.

First let's start by mentioning a few languages and how their units are different.

Examples

In C#, there are classes with single inheritance ability, and there are interfaces. There are abstract classes which are like a normal class containing only properties PLUS an interface that the classes doesn't implement and passes down to its children. Then there are sealed classes that don't accept children.

In C++, class A : public class B can mean either inheritance or composition. And there's no notion of single inheritance. There is a concept of friend class which bypasses encapsulation and allows a class to have access to everything within another class's instance.

Typescript allows for an interface to have variables, kinda like C++. But C++ doesn't have have a direct keyword for it.

Rust has trait which is like the C# interface. But Rust's traits can have default implementation. So more like an abstract class that doesn't have variables.

Javascr*** - don't get me started on that.

How to find out which approach is better?

Here is my goal: I want to find a common way to analyze a language unit regarding the concepts I mentioned before.

In the most abstract sense, a piece of code is either a data, or a behavior that works with data and as a result it may or may not change the data. It's either registers or instructions in the assembly-sense.

We can start by thinking about our two fundamental units of a language:

  • Variable (data)
  • Method (behavior)

Also consider the fact that a behavior unit can have default implementation or can be totally body-less. And additionally, a variable can be constant too.

Variables and methods can or cannot mutually exist within a unit depending on the language designers' choice. For example, in C#'s "interface" unit, only body-less behaviors and const variables are allowed.

So, here is the set of all of the behavior-side possibilities for a language unit:
{ methods without default implementation,
methods with default implementation,
no methods }

Here are the data-side possibilities for a unit:
{ vars,
no vars,
only const vars }

Finally, for a language unit, here are all of the permutations of data and behavior, calculated by multiplying the two sets:

(methods w DI, no vars) ---> BEHAVIOR HEAVIEST
(methods w DI, only const vars)
(methods w DI, vars)

(methods wo DI, no vars)
(methods wo DI, only const vars)
(methods wo DI, vars)

(no methods, no vars)
(no methods, only const vars)
(no methods, vars) ---> DATA HEAVIEST
Enter fullscreen mode Exit fullscreen mode

We have 9 options for a language unit with different intensities of being data-heavy or behavior-heavy:

  • The behavior-heaviest unit is a unit that contains a bunch of methods with default implementation but doesn't contain any variables.
  • The data-heaviest unit is a unit that only contains variables and no behaviors.
  • And there's everything in between.

We can map them to examples already existing in different languages:

(methods w DI, vars) => typical class in most languages
(methods wo DI, vars) => typical abstract class
(methods wo DI, no vars) => typical interface
Enter fullscreen mode Exit fullscreen mode

Making An Imaginary Language

As a thought experiment, we can take the most extreme sides of the range, and make a language that only has those two units (in addition to the two fundamental units.)

Here are our language's building blocks:

  • Single variable unit
  • Single method unit
  • Collection of variables as a unit
  • Collection of methods as a unit (with default implementation)

The next step is to define how would the connection between units be. There are the 6 classic types: inheritance, implementation, composition, aggregation, association, and dependency.

My theory is that only having composition and association relationships would suffice for our language. To me, composition means having a certain quality and association means just using something.

Let's code a short example; designing the logic for a small tower-defense game.

We can consider a bunch of variable-collection units (vc for short.)

vc health
{
    float remaining_amount;
    float max_amount;
}

vc transform
{
    vector2 position;
}

vc entity
{
    bool is_alive;
}

vc agent : has entity, has transform, has health
{
    int hitting_power;
}

vc building : has entity, has transform, has health
{
    list<agent> agents;
    int max_agent_capacity;
}
Enter fullscreen mode Exit fullscreen mode

We composite the healthness to agent. So it will contain remaining_amount.
We don't composite agentness to building. building has association with agent, meaning that it just uses the object.

As for the method-collection units (mc for short), we add the following;

mc moving
{
    void move(vector2 amount);
}

mc attacking
{
    void attack(entity* atackee, int damage_amount);
}
Enter fullscreen mode Exit fullscreen mode

Now we can complete the logical design:

vc agent : has entity, has transform, has health, 
    has attacking, has moving
{
    int hitting_power;
}

vc building : has entity, has transform, has health, 
    has attacking
{
    list<agent> agents;
    int max_agent_capacity;
}
Enter fullscreen mode Exit fullscreen mode

I think method-collection units by nature are users of variables-collections, they don't compose them.

Turing completeness

Being Turing complete means that a language can solve any computational problem. The already existing programming languages are already Turing complete. Our language has actually less features than a normal language, as it has less inclusion in its constructing units. We can't just claim that it is Turing complete too, we must prove it.

Turing machines are cool because they can solve any computational problem. So if we prove that our language can implement the structure for a Turing machine, we have proven that by proxy, our language can solve any problem. Although the solutions might not be efficient.

Here's the gist of a Turing machine coded in our language:

mc tape_managing
{
    char read();
    void write(char symbol);
    void move_left();
    void move_right();
}

vc state
{
    string name;
}

mc state_transiting
{
    state goto_next(state* current, char read_symbol);
}

mc machine_running
{
    void step();
    void run();
    bool should_halt();
}

vc turing_machine : 
    has tape_managing, 
    has state_transiting, 
    has machine_running
{
    state current_state;
}
Enter fullscreen mode Exit fullscreen mode

The code doesn't look nice for this use case. But it works. So this language can produce solution for any problem. They might not look cool, but they are there.

So what?

The abstract idea of what I talked about can be a useful thing to consider when designing software: using smaller units. Not as small as my language's unit. But, for example, having a less crowded class would be helpful to achieve a cleaner design a lot of times.
It is much much easier to overengineer and overcomplicate. This type of constraint by nature helps you practice design in a cleaner way.
For me, I also have too much fun thinking about abstractions. So I'll keep doing it.

Top comments (0)