rwalroth

Posted on Jan 6, 2021 • Edited on Jan 26, 2021

Understanding Objects from a Self Taught Perspective

#oop #python #cpp #javascript

When learning python or JavaScript, you may be told that "everything is an object" with little to no elaboration. Software architecture tutorials focus almost entirely on object oriented programming, but never really talk about what these things are or why we all use them. Especially coming from a higher level language like python, it can be very unintuitive what you are doing when you create a class and then use that class to instantiate an object. So I decided to write up a quick description of what objects are, what they are not, and when I think it is a good idea to use them.

What is an object?

Lower level - arranging memory

One of the downsides of starting out with a high level language like python is the computer hardware is completely abstracted away, including the actual data stored in memory. But to understand what an object is, you need to first know how data is stored (and if you want a nice fun intro to please check out Nand Game).

All data in memory is stored as 1s and 0s, in discrete blocks. Typically these blocks are 32 or 64 bits, each bit being a single 1 or 0 value (the "bitness" of the computer). All data, of all types, is stored this way, and that is crucial to grasp. In high level languages, you don't ever get to work with this kind of data but in low level languages like C++ these are the built in types such as int, float, or char. When you create a new variable, the computer grabs a block of memory and fills it with that data. In the picture above, the code on the left results in memory allocated on the right. It's important to note that these variables could be stored next to each other or not, there is no guarantee where they will end up. The location of that block in memory is its address, and that address is itself stored as a fundamental data type called a pointer. Now we get to the important part, since a computer can only store one thing in one block, and addresses fill an entire block on their own, how can a computer store more complicated data structures?

Lets say we want to store an array of 3 ints, like in the code above. In a high level language you create a list object and work with that, but in lower level languages you would instead ask the computer for 3 blocks of data and get the pointer to the first of the 3 blocks. Now you have gone beyond the scope of a single block, and you can do that because you know that the computer has reserved 3 blocks for you. Unlike before, the three blocks are guaranteed to be adjacent to each other. This is a "map" of where your data is, and is pretty straightforward. Traversing through your data is as simple as moving one data block at a time.

Now, let's say you have more things you want to store. Not just an int, but maybe an int and a char and a float. Well, you can ask the computer for three blocks adjacent to each other, and then traverse through it. This is essentially what a class is, a map of how to get to data in memory from a specified starting point. In the above example, all the data are fundamental types so a compiler could create this with three blocks of adjacent data but it doesn't have to. When you write a class, what you are doing is laying out what types of data you want to access when dealing with this object. When you create an object instance, the computer goes off and grabs some blocks of data, reserves them for you, and then gives you a pointer to get that data. The way it's laid out can get very complicated, maybe instead of data it just keeps a list of addresses. This is up to how a programming language is implemented, but in the end it's all the same. It's a blueprint for how to store data in memory, and every time you create an object the computer will store the data in the exact same way and therefore it will know how to get at all the data just given a pointer to the start and the map.

These pieces of data are called attributes, and in python, JavaScript and C++ they are accessed by the . operator (in C++ they are be accessed by the -> operator if using pointers). What the program is doing behind the scenes is going to that object's starting location, checking where that attribute should be located relative to that starting location based on the class, and then returning whatever is at that location in memory. You may have heard C++ is "unsafe", what that means is you get to directly manipulate pointers and could accidentally find yourself outside of the object data and messing with a different object's data. Python doesn't let you do that, you can only use python's logic for traversing memory which is very robust.

When everything is an object

So what does it mean when "everything is an object"? Well, in a statically typed language, there is a big difference between a fundamental type and an object. When you create a fundamental type, it's going to point to just one block of memory. In principle, you could swap this with any other fundamental type and there are methods for doing that. But when you create an object the computer will grab a set of blocks for you and populate them with data. Some of these blocks will be addresses and some will be fundamental types. Some of them will be addresses to other objects which the computer will also need to allocate. In python and JavaScript, you are not given access to fundamental types. You always create a more complicated object.

Methods

So far I have talked about objects which only hold data. But there is another half to objects of course, methods or member functions. Lets use the following example code for a class that stores some data and fits a function to that data. First, implemented without any classes:

myDict = {"data": [], "fitParams": [], "fit": [], "res": []}
def data_function(x, *params):
    # The function we are trying to fit to a data set

def fit_data(myDict, data_function):
    # Run a fit routine, store parameters, fit, and 
    # residual data in fitParams, fit, and res

def calculate_val(myDict, data_function, x):
    # return the result at x for the predicted function

We have a dictionary with some specifically named data types, we have some functions which accept that dictionary and a function as arguments. Now lets do the same thing with a class:

class myFitter():
    def __init__(self, data, func):
        data = data
        fitParams = []
        fit = []
        res = []
        data_function = func

    def fit_data(self):
        # Run a fit routine, store parameters, fit, and 
        # residual data in fitParams, fit, and res

    def calculate_val(self, x):
        # return the result at x for the predicted function

Take a look at the arguments in the class methods. You'll notice that the main difference is myDict is now replaced by self. Essentially, these two are exactly the same. In fact, for languages written without any classes at all this is a pretty common way to write code. First define how data will be stored, then write a set of functions which are all grouped together by their shared use of data. In python, there is even a __dict__ attribute which itself is a dictionary keeping track of all the class attributes. Getting back to the lower level, the information needed to create an object is the class. This tells the computer what memory is needed and where the program expects to find it. This can also include pointers to functions that will operate on this data. The . operator will direct the computer to some location in memory based on the name, and retrieve either some data or a method. A special aspect of member functions is that they are implicitly or explicitly handed the pointer to the object that called them. In other words, they know they are members of a class and also know who is calling them. That means they can access all the other member functions in the class as well as all data stored in the object.

Inheritance

Inheritance just means that instead of drawing your map from scratch, you start from a previously drawn map and extend it. There is no difference between copy and pasting all the code from the base class and inheriting from it, especially in languages like python which lack private members and attributes. They are a nice way to reuse code or make minor variations on an existing class.

What is an object not?

They are not real world objects

Objects are frequently introduced by comparing them to real life objects, like chairs or buckets. The problem is that computers don't actually store chairs and buckets, they store 1s and 0s. This is something completely glossed over in coding tutorials, but it is very important to understanding objects - objects are convenient ways to store data. Remember, an object is just some data and some methods that manipulate that data. I highly recommend this lecture by Catherine West for a more expert look on why this is a bad way to think about objects, but in brief real world objects interact with each other in ways completely different to how computer objects interact. If a person picks up a glass, the glass's positional "data" has been changed. But who changed that data? Not the glass itself. But in OOP, you would expect the glass's data to be private and the glass would always be responsible for moving itself. And this breakdown goes further than that, and even has computer performance implications.

You likely won't care about the performance hit, but in terms of designing your code it can be problematic to think about them this way. A well designed object has attributes that are all connected to each other an methods that are all needed and related. If you make a "chair" object, it might have a material, position, size, weight, price, style, and age. Do you ever need all these at once? Maybe style and age get used together with price, but does the position affect the price? Does weight affect age? In this case, why group all these attributes together at all?

Lets say you have a furniture store, and you want to keep track of furniture. You create a chair class, a sofa class, and so on. They each have different types of data, and you then store all the inventory in one large master class or array. Except you only care about the data. You might just want a list of prices to get an approximate inventory valuation. Or you might just want to know how much space you have available for more stock. Instead, you can have an array of position data, an array of prices, an array of types, etc. This is the "array of structs vs struct of arrays" debate if you want to read further, because there is a case to be made for both. In most cases, however, you will want the struct of arrays approach.

They do not make code cleaner or more performative

One reason I see frequently cited for using objects is to avoid "spaghetti" code. The claim seems to be that by using objects and inheritance you can somehow avoid a tangled set of functions which depend on each other in weird ways. This is not true at all. Classes can very easily become spaghetti if your functions are not clearly written, or if a class ends up with 100 member functions and 20 data attributes. Even worse, you now introduce the issue of complex inheritance hierarchies where a class inherits a class which inherited a class and so on. How do you know which methods are safe to override? How do you even know you're overriding a parent method without double checking the whole family tree?

So then why classes and objects?

Organizing data

Occasionally you might come across someone derisively referring to a programming language feature as "syntactic sugar", meaning it just changes syntax with no underlying performance implications. But every feature of every programming language, right down to the use of letters and numbers, is syntactic sugar. If you're not writing assembly code, you are using syntactic sugar. And that is all classes are, syntactic sugar. Take the following example:

def func(a, b, c, d, e, f, g, h, i, j):
    # Do some stuff with lots of variables

myDict = {'a': 0, 'b': 2 ...

def func2(myDict):
    # Do the same stuff but with one dictionary

class Obj():
    def __init__(self, a, b, c, ...
        a = a
        b = b
        ...

    def func3(self):
        # Do the same stuff but now no arguments at all

The first example is extremely clunky, no one wants to type that many parameters every time a function gets called and sometimes you do need that much data. The second example groups the data together so you can conveniently pass it to a function. This is much better, and helps keep the code more organized too. The final example adds nothing at all, just makes a class. But if func was particularly complicated, you could use the class to break up one big member function into a few different member functions to improve clarity. But it is important to not make objects too large, otherwise they get unwieldly quickly. Think about objects as convenient ways to organize data, and build them around that.

You can implement complicated data types

Even without taking a data structures course, you might want to build your own data type. Maybe you have a list of dates, and you want to be able to change all the dates at once. You can make a class which wraps a simple list, and have a set_new_start method which sets a new starting point that all dates reference. Maybe you want to store absolute and relative dates. An object helps you control how data is stored and modified.

They help modularize larger code bases

For simple tasks an object should be kept as small as possible, but objects do have one use case I know of where they will get very big and complicated. In larger code bases, with thousands of lines of code, objects are convenient ways to pass around large parts of the software itself. For example, lets say you have a GUI you are building to analyze data. You might have a main window, some inputs, and a display. The main window can be an object which also handles data storage and be a parent to the inputs and display. It can pass messages between these. And the inputs might do some input checks before passing along the message. Breaking code out this way lets you assign one person to one widget or group of widgets. The interaction between objects is well defined, so the individual developers get more freedom in building the internals of their code without worrying about stepping on someone's toes.

Conclusion

Objects are a great tool for writing code, but not a goal in and of themselves. I highly encourage you to try your next hobby project with no classes at first and see how far you get, then start bundling functions and data when you see places that it would help make the code easier to read.

I hope this was useful, let me know what you think!

DEV Community