At the beginning of this year, I was promoted to intermediate developer 🎊
I decided I needed to better understand data structures and be more comfortable with algorithms. Not because I wanted to leetcode more. I really don't want to leetcode more. But I couldn't shake the feeling that I would be better off if I understood more why data structure A over data structure B.
So I reached out to a friend for help and this is what I've learned 🤓
My mental model of Big O has always been this:
1) A unit of measurement
2) Related to computer science that
3) Describes the complexity of things
From here, I needed to understand why? 💭
As developers, we deal with data.
Sometimes not very much of it, like on a static website. Sometimes a whole lot of it. The multi-millions of users kind. And most of the time, that data is not in a format that we need and we need to manipulate it. Sort it, filter it, or find something. Sometimes we even need to change it into an entirely different format! And how efficiently we do that matters at scale.
What's also true is that there are many ways to solve a problem. This is especially true in programming. You can then think of Big O notation as a way to describe how efficient a solution is relative to another one.
In this post, we'll focus just on the types that apply to arrays but know there are a number of them that you can see below:
Source: Big O Cheatsheet
For arrays, you can have 2 types of time complexities (or Big O):
1) Constant time or O(1)
2) Linear time or O(n)
With Big O, the n refers to the amount of data you are working with.
Let's say you're building a Pokemon app and you have an array of Pokemon.
const kantoStarters = ['Charmander', 'Bulbasaur', 'Squirtle']
If you know the index of Squirtle in the array, you can access it by simply doing
kantoStarters[index]. If this was instead an array of all 151 Kanto Pokemon, the number of steps it takes to access a Pokemon at a known index will be the same as when there were only 3 starter Pokemon because you can go directly to the index of the Pokemon. Hence, access in an array is considered constant time - also known as O(1).
Because constant time takes the least number of steps to complete an operation, it is considered the most efficient. Check that first graph out again!
Let's say instead of knowing where exactly to look for a Pokemon in an array, we have to flip through it like a clothing rack at the mall or files in a filing cabinet. In this case, it would take at worst as many steps as there are Pokemon. Remember that n in Big O notation stands for the amount of data we are working with. So should we have to look through an unordered array of all 151 Pokemon to find a Psyduck it would take us O(n) steps. This is called linear time because given more data we take proportionately more steps.
At this point, since constant time - or O(1) - takes a constant amount of steps, no matter the amount of data versus linear time - or O(n) - which takes proportionately more steps when given more data, we can say that constant time is faster or more efficient than linear time 💨
Once we move into insertion or removal of data into an array, it gets a little nuanced. Let's say we create a new type of Pikachu that wears a coloured party hat (think Nintendo 64 Super Smash Bros) and we wanted to officially recognize it as a Kanto Pokemon: Party Pikachu. If we add Party Pikachu to the end of the list of Pokemon, that would only take one step. Hence, insertion at the end of arrays is constant time - or O(1). The same goes for removal.
It's different, however, if we're trying to insert or remove an item from any other place in the array. Why? If we added Party Pikachu to the beginning, all the indices of the Pokemon after it would have to change because the order of Pokemon is now different. This also applies if Party Pikachu were to be added in the middle of the list. We would have to take as many steps as the number of Pokemon that come after it to change the indices to the new ones. Hence, insertion or removal anywhere but the end is linear time - or O(n).
const originalKantoPokemon = ['Bulbasaur', 'Ivysaur', 'Venusaur'] // and so on // Where Bulbasaur is index 0 const newKantoPokemon = ['Party Pikachu', 'Bulbasaur', 'Ivysaur'] // and so on // Where Bulbasaur is now index 1
You might be thinking, "That's great and all but why do I need to know this?" That's fair. I've been able to have a successful last 4-5 years as a developer without it. Heck, I even got promoted. But there's two possible reasons:
1) You want to get hired at a company that does leetcode.
FAANG companies - also known as Facebook, Amazon, Apple, Netflix, and Google - or similar, are infamous for testing leetcode, algorithms, and data structures in their interview process. If you want to get hired by them, you need to be able to reference Big O when you write an algorithmic solution.
2) You need to come up with efficient solutions.
Even if you avoid interviewing for companies that do leetcode, you will still have to work with data. And unless you can always work with a small amount of data, how performant the solutions you write to handle data will be important. Especially as you become a more senior engineer.
(This will become more apparent as I continue this series by moving into showing actual algorithms. Follow me and stay tuned!)
I'm personally in the second boat but I've since been opening myself up to the idea of the first one. Let's get better first then we'll see 🤡
I was the kind of kid who was, for all intents and purposes, intelligent but didn't identify with being good at STEM subjects despite being an honour roll student throughout my education. Heck, my favourite subject was music. But at some point, you hit a wall that makes you realize your work could go much more smoothly if you deepened your knowledge in a particular area 🚀
My goal is to be able to confidently answer why we should store data a certain way (i.e. dictionary vs. list) or traverse large amounts of data in a certain way, no matter if I'm being asked in an interview or if I simply have to complete a task for a job I'm currently employed for 💃🏻
You can think of what we discussed so far as the building blocks for choosing between multiple ways of handling data. If we know that searching through an array is linear time and we later find out that there's an alternate solution for searching through data that is constant time, which is faster, we might want to use the latter solution. However, there's other things to weigh, like readability and maintainability. More on that another time.
I'll keep learning and be sure to share more 😬
Off to study linked lists!
Keep it candid,
Note: This post focuses more on practical examples than it does on mathematical visuals. This is because not everyone will understand Big O with mathematical graphs. But if you are someone that will, I recommend this.