DEV Community

Carlos Damázio
Carlos Damázio

Posted on

Pointers: a soft guide to reference emulation

Hey folks! In this post, I'll be giving an introduction to referencing in programming, of which we often call pointers.

Many programmers get intimidated by learning new programming languages, specially when it features low level handling like references and memory management, and I know where this insecurity comes from. We are not used to learn computer science concepts using low level programming anymore. We often are introduced to programming using high level languages like Python/Java that has enough abstractions to get away with not needing to learn some concepts to start programming. These languages are aimed to productivity and abstractions provide it.

DISCLAIMER

This is not meant to be an 100% accurate guide for pointers in C or Go programming, this is just me trying to explain a concept, which might be useful for someone else learning as well. Just keep in mind: each programming language treats pointers in a different manner and some cases have trade-offs. Specific situations where we implement pointer operations in C are not the same when implemented in Go or Rust, where we don't find substantial improvement in optimization or performance.

Another disclaimer is: I mean, nobody should be forced to learn this stuff if your craft doesn't depend on it. You might be a web developer that needs to deliver the application to a client. You might be a data scientist that needs to grasp insights from datasets. Often, this kind of craft doesn't have a requirement to learn hardcore CS (Computer Science) concepts in order to get started. You don't really need to understand how computers work that much. However, whenever you're trying to write anything that scratches optimization's surface, or if you're trying to improve your craft (programming, for instance) or expand your knowledge, sooner or later, you're going to face these concepts.

With this out of the way, I'd like to write this article to someone like me in 2012 trying to understand this concept. So let's study some pointers, shall we? This is the core and most important structure (do not mix this structs) of referencing. But to understand pointers, you need to understand how variables and their values assignments work.

Variables

On most programming languages, we have what we call data structures so that we can keep in the computer's memory some data that the program needs in order to work. We have Queues, Stacks, Heap, Trees, Graphs, you name it. But there's a simple structure that holds a single value of a single type, which is a variable.

Example in C:

    // var.c
    #include <stdio.h>
    #include <stdlib.h>

    int main(void)
    {
            int x = 0;

            printf("Variable \"x\" = %i\n", x); // Variable declaration + assignment
            return 0;
    }
Enter fullscreen mode Exit fullscreen mode

Example in Go:

    // var.go
    import "fmt"

    func main() {
            var x int // Variable declaration
            x = 0 // Variable assignment
            // or
            x := 0 // Direct variable declaration + assignment
            fmt.Printf("Variable \"x\" = %d\n", x)
    }
Enter fullscreen mode Exit fullscreen mode

In a programming language, whenever a variable is declared, it's value is assigned in memory and it's put on a data structure called The Stack. It's a pretty efficient structure that holds local variables. But it's not that simple: it holds something called Stack Frame, which is a slice of the stack that's a reserved space in memory for the function that's being executed. This frame contains:

  • Function parameters;
  • Caller's address to return the execution to;
  • Local variables from the function.
      #include <stdio.h>
      #include <stdlib.h>

      void function_call()
      {
            int x = 0; // Sticks automatic variable x into the Stack during function execution.
      }

      int main()
      {
            function_call(); // Once it's done, x is popped out from the stack along with its stack frame.
            return 0;
      }
Enter fullscreen mode Exit fullscreen mode

So once a function is done with its execution, these variables are popped out of the stack and its memory reclaimed, and the stack pointer goes to the frame pointer, yada yada yada. It has a fixed size, so a careful usage is required since you're not managing it.

OK, you know what variables are for and how are they handled, but what does pointers have anything to do with this?

Pointers?

A pointer is a variable that holds the address of another variable, like a proxy to it.

Example in C:

    #include <stdio.h>
    #include <stdlib.h>

    int main()
    {
            int x = 0;
            int *x_ptr;
            x_ptr = &x;

            printf("Address of variable x: %x\n", x_ptr);
            printf("Value of variable x: %i\n", x);
            printf("Value of variable x through pointer: %i\n", *x_ptr);
            return 0;
    }
Enter fullscreen mode Exit fullscreen mode

Example in Go:

    import "fmt"

    func main() {
            var x int
            x = 0

            var x_ptr *int // Declares a pointer to an integer type
            x_ptr := &x // Assigns address of x to x_ptr

            fmt.Printf("Address of variable x: %x\n", x_ptr)
            fmt.Printf("Value of variable x: %i\n", x)
            fmt.Printf("Value of variable x through pointer: %i\n", *x_ptr)
    }
Enter fullscreen mode Exit fullscreen mode

Output:

Address of variable x: 0x7ffeef42f3a8
Value of variable x: 0
Value of variable x through pointer: 0
Enter fullscreen mode Exit fullscreen mode

These little folks are not stored in the stack, but on a data structure called The Heap (lol), which is a memory pool that is much larger than the stack and it can hold the address of the contained pointer. However, the stack is a very efficient data structure when we talk about memory allocation, because we just need to move the top of the stack just enough to store the stack frame, which is a O(1) operation, where as in heap, for insertion, can range from O(1) to O(log n) depending of the implementation, and it's handling is NOT automatic, where in stack it is...

You might be wondering...

Why can't I just use the variables? Why do we need pointers?

Which makes sense, right? This is the tricky part of the pointers that a lot of people don't get, such as myself when I was just learning about this stuff. Since we didn't need to deal with this thing, why bother with it, right?! But I've come to understand when dealing with scopes. There are 2 ways of getting the parameters for a function: pass by value and pass by reference.

  • Pass by value: a copy of the argument (variable) is made inside the function. So all changes made for the variable inside the function doesn't affect the caller.
  • Pass by reference: the address of an argument is passed to the function. So all changes that are made for the pointer inside a function does affect the caller.
  • C doesn't implement reference data types, so all passes are by value. To emulate a pass by reference, you can use pointers to those variables that you want to change instead of the variables.

Here's an example of the passes in C:

    #include <stdio.h>
    #include <stdlib.h>

    struct Vector {
            int x;
            int y;
    };

    void multiply_scalar_value(struct Vector v, int scalar)
    {
            printf("Address of vector in function passed by value: %p\n", (void *) &v);
            v.x *= scalar;
            v.y *= scalar;
    }

    void multiply_scalar_ref(struct Vector *v, int scalar)
    {
            printf("Address of vector in function passed by reference (emulated by pointers): %p\n", (void *) v);
            v->x *= scalar;
            v->y *= scalar;
    }

    int main()
    {
            struct Vector v = {.x = 1, .y = 1};

            printf("Address of initial V in main: %p\n", (void *) &v);
            printf("v.x and v.y: %i and %i\n\n", v.x, v.y);

            multiply_scalar_value(v, 3);
            printf("Size of Vector: %lu\n", sizeof(v));
            printf("v.x and v.y after pass by value: %i and %i\n\n", v.x, v.y);

            struct Vector *v_ptr;
            v_ptr = &v;
            multiply_scalar_ref(v_ptr, 3);
            printf("Size of Vector's pointer: %lu\n", sizeof(v_ptr));
            printf("v.x and v.y after pass by value: %i and %i\n", v.x, v.y);

            return 0;
    }
Enter fullscreen mode Exit fullscreen mode

Output:

    Address of initial V in main: 0x7ffeee5724e0
    v.x and v.y: 1 and 1

    Address of vector in function passed by value: 0x7ffeee5724b8
    Size of Vector: 8
    v.x and v.y after pass by value: 1 and 1

    Address of vector in function passed by reference (emulated by pointers): 0x7ffeee5724e0
    Size of Vector's pointer: 8
    v.x and v.y after pass by value: 3 and 3
Enter fullscreen mode Exit fullscreen mode

Remember when I told that the stack's size is static? So every time you're calling a function, passing arguments as values, you're making a stack frame with whole copies of those arguments into the memory, where in passing by reference, you're passing an address to the function instead, not re-creating the variable inside the stack frame. Take a look at the size of the pointer: it's 8 bytes (in a 64-bit machine, 4 bytes in 32), you're passing less bytes to a function than using the whole Vector struct, which is 12 bytes, because a pointer's size is fixed.

You might say, as simple as this program can be, it's not really a big deal. It's true when you're dealing with primitive data types, such as int, char, double, but how about data structures that can accommodate multiple values and can grow infinitely? How about structs that can aggregate not only primitive data types, but other structs, other data structures nested into each other? How about an array of structs? So yes, pointers are very useful if you're using data structures that are bigger than its size.

Now, here's a thing as well: if you're using pointers to optimize in resources, don't use pointers in structs that its size is less than 8 (or 4 in 32 bit machine). Just so you know what's up with that, let's add another example.

    #include <stdio.h>
    #include <stdlib.h>

    typedef struct Node Node;
    typedef struct Node_valued Node_valued;
    typedef struct Node_ref Node_ref;

    struct Node {
            int value;
    };

    struct Node_valued {
            Node current;
            Node next;
    };

    struct Node_ref {
            Node *current;
            Node *next;
    };


    int main()
    {
            Node node1 = {.value = 10};
            Node node2 = {.value = 15};
            Node *node1_ptr = &node1;
            Node *node2_ptr = &node2;
            Node_valued node_valued = {.current = node1, .next = node2};
            Node_ref node_ref = {.current = node1_ptr, .next = node2_ptr};

            printf("Let's compare the sizes\n\n");
            printf("Node 1: %lu\n", sizeof(node1));
            printf("Node 2: %lu\n", sizeof(node2));
            printf("Node pointer 1: %lu\n", sizeof(node1_ptr));
            printf("Node pointer 2: %lu\n", sizeof(node2_ptr));
            printf("Node_valued: %lu\n", sizeof(node_valued));
            printf("Node_ref: %lu\n", sizeof(node_ref));

            return 0;
    }
Enter fullscreen mode Exit fullscreen mode

Output:

    Let's compare the sizes

    Node 1: 4
    Node 2: 4
    Node pointer 1: 8
    Node pointer 2: 8
    Node_valued: 8
    Node_ref: 16
Enter fullscreen mode Exit fullscreen mode

You see: pointer usage is not always equals to optimization. In this example, node ref has two pointers to structs, each one with 8 bytes in size, but on node valued, we have 2 structs containing a single member "value" which is an integer, giving us 4 bytes in size. So, resource wise, use pointers in this case is not a good thing as well.

Lessons learned

In this example, we could identify 3 reasons to use pointers, in specific cases as it follows:

  • If you really want to change the function caller's arguments on the outer scope of the function, you can pass a pointer of the arguments;
  • If you are developing a real complex application that has complex data structures, you might need to think whether you need those 20 structs lying around in memory, or you can do it with a single struct and its pointer;
  • On structs and unions definitions, be mindful of the sizes of its members. If a nested struct is huge, use a pointer to it instead!

However:

  • Referencing can definitely make the code more complex, so be mindful of when using pointers or you'll need to upgrade your debugging skills with this, because it's fucking dreadful.
  • You might think that use a pointer might optimize things, but in reality, you can end up bloating your struct as well. Remember that pointers have fixed size and depends of the architecture.
  • Not every usage of pointers equals to better performance. I don't think a pointer to a single integer variable makes a meaningful difference in terms of optimization, even though some algorithms with pointers arithmetic can do.
  • From programming language to programming language, pointers implementation has differences. I admittedly don't know it's differences, for example, in Go and C. I'm still learning my stuff, so I strongly advice for you to do your homework and learn more about your craft. Be curious and own your stuff, then, share it with others just like I'm doing.

Acknowledgements

Thanks for reading this up. I really don't know how much of a use this might bring to your career, but it's something that I've been grappling with all these years, so if I might help someone else to go through this concept, I'd be glad. And I think that it's something that I should do every now and then, be more active on my stuff regardless of the engagement it might produce. This really helps me to learn more and to be curious about my craft.

So yeah, thanks, I guess. :)

Top comments (0)