DigitalCrafting

Posted on Feb 18 • Updated on Mar 18

Programming from Top to Bottom - The Basics

#programming #learning #beginners

As promised in this article, we are going to dive into how programming actually works from top to bottom. Or in other words: how a CPU can understand what you are writing.

Disclaimer: this series of articles is supposed to be a high-level overview, that gives you an idea of how programming works, and a starting point for future learning. As such, it will contain many simplifications.

What is programming language?

Let's start really high: what is programming?

It's a way of telling computer what to do.

Kind of obvious, right? Let's go a bit deeper: how do you do that?

By using a programming language like Java.

Correct! Well, mostly. That's because a computer, or rather a CPU, doesn't understand language as you understand it. It doesn't matter if it's Python, Java, C or Assembly, a CPU doesn't understand what you are writing. What does the CPU understand then? 1s and 0s. And it's not even a word function represented as 1s and 0s, the CPU does not have a concept of function or class, or any keyword for that matter. What we, the programmers, understand as a "programming language", is soooooooo far from what actually happens, that it's really hard to imagine it.

And that's mostly ok. You don't need to understand how circuitry works to be a good developer, however, you should be aware of how much extra work a CPU needs to do in order to interpret high-level programming language, and how you can write code that's easier to process by CPU.

You might think that compilers should take or of that, but compilers are simply translators. Much like CPU, they do not understand what you want to achieve, they just process text. Of course, there are some optimizations that they can do, but they will not rewrite your code to 100% efficiency, not even close.

In order to understand why though, we first need to understand:

What does the CPU do, actually?

At the core of it, the CPU only does few things:

binary operations,
math operations,
reading from memory,
writing to memory.

That's mostly it. If you are interested, you can check the Intel 8086 Assembly Manual. It's old, yes, but:

the CPU didn't change that much in terms of what it basically does, the instructions you find in there are still present in the modern CPUs,
thanks to being limited to 8086, it's easier to wrap your head around it, since compared to modern ones it's quite simple.

In order to perform those operations, the CPU has to somehow store values in needs to operate on. For that, it has internal storage, called Registers, you can see it in the Manual at page 24. Each register in 8086 has 16 bits (modern CPUs have 64 bit registers). Which means that an 8086 CPU can operate on chunks of data that are at most 16 bits at a time. That's 2 bytes.

Think about that for a second.

In a world where we operate on whole objects containing various data, how does a CPU handle that?

That's quite a jump in perception, isn't it?

Memory

We now know, that CPU can only operate on very small data chunks, which are stored in memory. So, what does a memory look like?

You can think of memory as a huge array of bytes, like this:

Each byte has unique address (12345 - 12351 in the picture), and holds, well, single byte. What happens when data occupies more than 1? It just spills over to the next one and it can occupy as many as it needs, provided there is enough space, of course. For example, string "Hello" would be stored like this:

The \0 character denotes end of string, otherwise we would never know where to stop when reading it.

That's it. At the lowest level, that's pretty much how it looks like, just an array of bytes and a piece of hardware that moves it around.

Even given, that it's a simplified explanation, that's not a lot to work with, isn't it? And yet, we somehow create very complicated programs that do amazing things.

I will only mention, that the bulk of the job is done by compilers (we will learn about them) and hardware. In the future we will take a look at how hardware communicated - how a monitor knows which pixels to light up, how a network card knows when to send data, etc, but for now let's stay in software realm.

Let's look at the other side of programs now - programming languages.

Types of Languages

There are few ways to categorize programming languages. For the purpose of this series, I will categorize them like this:

Interpreted - like JavaScript or Python. Meaning, you just feed what you wrote directly to interpreter,
Virtual machine languages - like Java. Meaning you still need to compile it, but you feed the compiled code to VM.
Compiled - C, C++, Odin. Languages compiled to machine code for a specific system and CPU architecture.

These are the 3 things I will cover: language interpretation, VMs and machine code compilation. Again, this will be simplified version, which is supposed to give you an idea, not a complete solution.

Basic flow

For interpreted languages, the flow looks like this:

For VMs and Compiled languages it's actually not that different:

Each step can be further divided, but we will talk about that in separate articles. For now, let's just say that whether it's interpretation or compilation, the first step is parsing.

This looks relatively simple, and it mostly is. It's not that hard to interpret:

int c = a + b;

The tricky part is how do we represent something like this:

public class Test {
    private int x = 3;

    public int multiply(int y) {
        return y*x;
    }
}

Test tester = new Test();
int result = tester.multiply(7);

In the memory and CPU instructions?

Spoiler alert: the class does not exist

Summary

That was a very brief and (hopefully) simple explanation of what the CPU actually does, and it showed the gap between the CPU and what we write. In this series I'm hoping to close that gap just enough for every developer to have a general understanding of the whole process, as I believe it will benefit us in the long run.

From the next article, I'm going to explain how it's done from the top in much more detail, starting with parsing.

I hope this article piqued your interest and you learned something interesting 😃 See you in the next one!

DEV Community

Programming from Top to Bottom - The Basics

What is programming language?

What does the CPU do, actually?

Memory

Types of Languages

Basic flow

Summary

Top comments (0)

Read next

WHAT FRAMEWORKS DO BIG COMPANIES USE 🤔

New Web Portfolio 16bit OS Style

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Coder's Journey